Drop in for OpenAI

If your application talks to chat.completions.create or embeddings.create via the official OpenAI SDK, you can switch to Aurous Labs by changing two lines: the baseURL and the apiKey. Streaming, tool calls, structured output, multimodal input, idempotency, and rate-limit headers all work without modification. This guide walks through the exact diff for Node, Python, and a few popular framework integrations.

The two-line change

// Before — OpenAI
import OpenAI from "openai";
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

// After — Aurous Labs
import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!, // al_live_xxxxxxxxxxxxxxxx
});

# Before — OpenAI
from openai import OpenAI
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)

# After — Aurous Labs
from openai import OpenAI
client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key=os.environ["AUROUS_API_KEY"],  # al_live_xxxxxxxxxxxxxxxx
)

That’s it. Every call you were already making (client.chat.completions.create, client.embeddings.create, client.chat.completions.stream, client.models.list) works against Aurous without further changes.

Model names

Aurous models use the aurous-* prefix. Swap your model identifier:

OpenAI model	Aurous Labs equivalent	Notes
`gpt-4o`	`aurous-grow-2.0-pro`	Multimodal, tool-capable, reasoning-capable, 256K ctx
`gpt-4o-mini`	`aurous-grow-2.0-pro`	Day-1 only ships one chat model; mini tier coming
`text-embedding-3-*`	`aurous-embed-vision-1.0`	Text + image + video → single combined vector, 128K ctx

The full catalog is at GET /v1/models. Each model row includes a top-level kind (chat or embedding) and an aurous_metadata extension with the context window, supported dimensions, capability flags, and per-modality credit rates.

What works without changes

Chat completions (sync)

const completion = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  max_tokens: 64,
});

console.log(completion.choices[0].message.content);
console.log(completion.usage); // includes prompt_tokens, completion_tokens, total_tokens, credits_charged

The response shape is identical to OpenAI’s. The one Aurous extension is usage.credits_charged (a number) and usage.breakdown (per-token math for cost reconciliation). If your code only reads usage.total_tokens, you don’t see the extension; if your code logs the whole usage object, you get the extra fields as ignorable extras.

Streaming chat completions

const stream = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [{ role: "user", content: "Write a haiku about latency." }],
  max_tokens: 256,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

We emit SSE frames in OpenAI’s exact shape (data: {<chunk-json>}\n\n) terminated by data: [DONE]\n\n. The final non-[DONE] chunk carries the usage block with credits_charged.

Tool calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["city"],
            },
        },
    },
]

res = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)
print(res.choices[0].message.tool_calls)

tool_choice accepts the same shapes OpenAI does: "none", "auto", "required", and { "type": "function", "function": { "name": "..." } }.

Structured output (JSON schema)

const completion = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    { role: "user", content: "Summarize this in JSON: 'Aurous Labs is a B2B API platform.'" },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "summary",
      strict: true,
      schema: {
        type: "object",
        properties: {
          one_line: { type: "string" },
          tags: { type: "array", items: { type: "string" } },
        },
        required: ["one_line", "tags"],
        additionalProperties: false,
      },
    },
  },
});

const parsed = JSON.parse(completion.choices[0].message.content!);

We also accept response_format: { type: "json_object" } (looser; just guarantees parseable JSON). See Structured output for the schema constraints we enforce.

Embeddings (text)

res = client.embeddings.create(
    model="aurous-embed-vision-1.0",
    input="The quick brown fox jumps over the lazy dog.",
)
print(len(res.data[0].embedding))  # 2048 native

Models listing

const models = await client.models.list();
for (const m of models.data) {
  console.log(m.id, (m as any).kind, (m as any).aurous_metadata);
}

Every Aurous model row has a top-level kind (chat or embedding) and an aurous_metadata extension. The OpenAI TypeScript types don’t know about those fields, so cast the row to any or extend the type yourself if you want compile-time access.

What needs an extra line

Multimodal embedding input

OpenAI’s embeddings.create accepts only string or string[] for input. To send multimodal content parts (text + image + video), use the SDK’s lower-level client.post() escape hatch:

const res = await client.post("/embeddings", {
  body: {
    model: "aurous-embed-vision-1.0",
    input: [
      { type: "text", text: "Photo of a golden retriever in a park" },
      { type: "image_url", image_url: { url: "https://example.com/dog.jpg" } },
    ],
  },
}) as { data: { embedding: number[] }[]; usage: Record<string, unknown> };

console.log(res.data[0].embedding.length); // 2048

res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "image_url", "image_url": {"url": "https://example.com/dog.jpg"}},
        ],
    },
    cast_to=dict,
)

See Multimodal embeddings for the full content-parts taxonomy and the per-modality rate math.

Batch embeddings (`input: ["a", "b", "c"]`)

OpenAI accepts input: ["a", "b", "c"] as a batched 3-vector return. Aurous does NOT — the underlying multimodal embedding model concatenates the inputs into a single combined vector, which is the opposite of OpenAI’s N→N semantics. Calls with input: string[] are rejected with 400 embeddings_batch_not_supported. The OpenAI-shaped workaround is to loop client-side. See OpenAI batch incompat for the rationale and the recommended loop pattern.

Headers and rate limits

Aurous response headers are a superset of OpenAI’s. Everything OpenAI emits is preserved; we add:

Aurous-Request-Id — req_<ulid> for support
Aurous-Version — the API version applied to this response
X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset — standard per-team RPM bucket
X-RateLimit-TPM-Limit / X-RateLimit-TPM-Remaining / X-RateLimit-TPM-Reset — additional per-team tokens-per-minute bucket on chat + embedding routes
Aurous-Idempotent-Replayed: true — present only when the response is a replay of a prior call with the same Idempotency-Key

OpenAI SDKs do not parse our custom headers; you can read them off the raw response object via the SDK’s with_raw_response (Python) or withResponse (Node) helpers.

Framework integrations

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="aurous-grow-2.0-pro",
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

print(llm.invoke("In 1 short sentence: what is an embedding?").content)

LangChain (Node)

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "aurous-grow-2.0-pro",
  configuration: {
    baseURL: "https://api.aurous-labs.com/v1",
  },
  apiKey: process.env.AUROUS_API_KEY!,
});

const res = await llm.invoke("In 1 short sentence: what is an embedding?");
console.log(res.content);

LlamaIndex (Python)

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="aurous-grow-2.0-pro",
    api_base="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

print(llm.complete("In 1 short sentence: what is an embedding?").text)

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const aurous = createOpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!,
});

const { text } = await generateText({
  model: aurous("aurous-grow-2.0-pro"),
  prompt: "In 1 short sentence: what is an embedding?",
});

console.log(text);

The AI SDK’s streaming, tool-calling, and structured-output helpers (streamText, generateObject) work without modification.

When OpenAI compatibility ends

A few things are intentionally NOT compatible:

Image generation — Aurous uses its own POST /v1/images surface (LoRA-based generation, async polling, anonymous-read output URLs). OpenAI’s images.generate SDK call won’t work — and we wouldn’t want it to, because the surfaces are fundamentally different (sync DALL-E vs. async LoRA-pipeline).
Fine-tuning — Aurous fine-tunes are LoRAs, not full-weight fine-tunes. They’re registered against the image generation surface, not the chat surface. The OpenAI fine-tuning SDK won’t work.
Audio (speech, transcription) — not yet on the platform.
Assistants / Threads API — not yet on the platform. The chat completions surface is stateless; if you need threading, build it client-side.

If you depend on one of those surfaces, keep your OpenAI client around for that surface only and use the Aurous client for everything else.

Where to next?

Chat completions — the full chat surface
Embeddings — the full embedding surface
Multimodal embeddings — the differentiator
How we count tokens — tokenizer details and the difference vs OpenAI’s tiktoken
Cost transparency — reading the credits_charged + breakdown story
Idempotency — safe retries on every modality

Get started

Guides

Concepts

API Reference

Resources

The two-line change

Model names

What works without changes

Chat completions (sync)

Streaming chat completions

Tool calling

Structured output (JSON schema)

Embeddings (text)

Models listing

What needs an extra line

Multimodal embedding input

Batch embeddings (`input: ["a", "b", "c"]`)

Headers and rate limits

Framework integrations

LangChain (Python)

LangChain (Node)

LlamaIndex (Python)

Vercel AI SDK

When OpenAI compatibility ends

Where to next?

​The two-line change

​Model names

​What works without changes

​Chat completions (sync)

​Streaming chat completions

​Tool calling

​Structured output (JSON schema)

​Embeddings (text)

​Models listing

​What needs an extra line

​Multimodal embedding input

​Batch embeddings (input: ["a", "b", "c"])

​Headers and rate limits

​Framework integrations

​LangChain (Python)

​LangChain (Node)

​LlamaIndex (Python)

​Vercel AI SDK

​When OpenAI compatibility ends

​Where to next?

The two-line change

Model names

What works without changes

Chat completions (sync)

Streaming chat completions

Tool calling

Structured output (JSON schema)

Embeddings (text)

Models listing

What needs an extra line

Multimodal embedding input

Batch embeddings (`input: ["a", "b", "c"]`)

Headers and rate limits

Framework integrations

LangChain (Python)

LangChain (Node)

LlamaIndex (Python)

Vercel AI SDK

When OpenAI compatibility ends

Where to next?