chat.completions.create or embeddings.create via the official OpenAI SDK, you can switch to Aurous Labs by changing two lines: the baseURL and the apiKey. Streaming, tool calls, structured output, multimodal input, idempotency, and rate-limit headers all work without modification.
This guide walks through the exact diff for Node, Python, and a few popular framework integrations.
The two-line change
client.chat.completions.create, client.embeddings.create, client.chat.completions.stream, client.models.list) works against Aurous without further changes.
Model names
Aurous models use theaurous-* prefix. Swap your model identifier:
| OpenAI model | Aurous Labs equivalent | Notes |
|---|---|---|
gpt-4o | aurous-grow-2.0-pro | Multimodal, tool-capable, reasoning-capable, 256K ctx |
gpt-4o-mini | aurous-grow-2.0-pro | Day-1 only ships one chat model; mini tier coming |
text-embedding-3-* | aurous-embed-vision-1.0 | Text + image + video → single combined vector, 128K ctx |
GET /v1/models. Each model row includes a top-level kind (chat or embedding) and an aurous_metadata extension with the context window, supported dimensions, capability flags, and per-modality credit rates.
What works without changes
Chat completions (sync)
usage.credits_charged (a number) and usage.breakdown (per-token math for cost reconciliation). If your code only reads usage.total_tokens, you don’t see the extension; if your code logs the whole usage object, you get the extra fields as ignorable extras.
Streaming chat completions
data: {<chunk-json>}\n\n) terminated by data: [DONE]\n\n. The final non-[DONE] chunk carries the usage block with credits_charged.
Tool calling
tool_choice accepts the same shapes OpenAI does: "none", "auto", "required", and { "type": "function", "function": { "name": "..." } }.
Structured output (JSON schema)
response_format: { type: "json_object" } (looser; just guarantees parseable JSON). See Structured output for the schema constraints we enforce.
Embeddings (text)
Models listing
kind (chat or embedding) and an aurous_metadata extension. The OpenAI TypeScript types don’t know about those fields, so cast the row to any or extend the type yourself if you want compile-time access.
What needs an extra line
Multimodal embedding input
OpenAI’sembeddings.create accepts only string or string[] for input. To send multimodal content parts (text + image + video), use the SDK’s lower-level client.post() escape hatch:
Batch embeddings (input: ["a", "b", "c"])
OpenAI accepts input: ["a", "b", "c"] as a batched 3-vector return. Aurous does NOT — the underlying multimodal embedding model concatenates the inputs into a single combined vector, which is the opposite of OpenAI’s N→N semantics. Calls with input: string[] are rejected with 400 embeddings_batch_not_supported.
The OpenAI-shaped workaround is to loop client-side. See OpenAI batch incompat for the rationale and the recommended loop pattern.
Headers and rate limits
Aurous response headers are a superset of OpenAI’s. Everything OpenAI emits is preserved; we add:Aurous-Request-Id—req_<ulid>for supportAurous-Version— the API version applied to this responseX-RateLimit-Limit/X-RateLimit-Remaining/X-RateLimit-Reset— standard per-team RPM bucketX-RateLimit-TPM-Limit/X-RateLimit-TPM-Remaining/X-RateLimit-TPM-Reset— additional per-team tokens-per-minute bucket on chat + embedding routesAurous-Idempotent-Replayed: true— present only when the response is a replay of a prior call with the sameIdempotency-Key
response object via the SDK’s with_raw_response (Python) or withResponse (Node) helpers.
Framework integrations
LangChain (Python)
LangChain (Node)
LlamaIndex (Python)
Vercel AI SDK
streamText, generateObject) work without modification.
When OpenAI compatibility ends
A few things are intentionally NOT compatible:- Image generation — Aurous uses its own
POST /v1/imagessurface (LoRA-based generation, async polling, anonymous-read output URLs). OpenAI’simages.generateSDK call won’t work — and we wouldn’t want it to, because the surfaces are fundamentally different (sync DALL-E vs. async LoRA-pipeline). - Fine-tuning — Aurous fine-tunes are LoRAs, not full-weight fine-tunes. They’re registered against the image generation surface, not the chat surface. The OpenAI
fine-tuningSDK won’t work. - Audio (speech, transcription) — not yet on the platform.
- Assistants / Threads API — not yet on the platform. The chat completions surface is stateless; if you need threading, build it client-side.
Where to next?
- Chat completions — the full chat surface
- Embeddings — the full embedding surface
- Multimodal embeddings — the differentiator
- How we count tokens — tokenizer details and the difference vs OpenAI’s tiktoken
- Cost transparency — reading the
credits_charged+breakdownstory - Idempotency — safe retries on every modality

