Skip to main content
If your application talks to chat.completions.create or embeddings.create via the official OpenAI SDK, you can switch to Aurous Labs by changing two lines: the baseURL and the apiKey. Streaming, tool calls, structured output, multimodal input, idempotency, and rate-limit headers all work without modification. This guide walks through the exact diff for Node, Python, and a few popular framework integrations.

The two-line change

// Before — OpenAI
import OpenAI from "openai";
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

// After — Aurous Labs
import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!, // al_live_xxxxxxxxxxxxxxxx
});
That’s it. Every call you were already making (client.chat.completions.create, client.embeddings.create, client.chat.completions.stream, client.models.list) works against Aurous without further changes.

Model names

Aurous models use the aurous-* prefix. Swap your model identifier:
OpenAI modelAurous Labs equivalentNotes
gpt-4oaurous-grow-2.0-proMultimodal, tool-capable, reasoning-capable, 256K ctx
gpt-4o-miniaurous-grow-2.0-proDay-1 only ships one chat model; mini tier coming
text-embedding-3-*aurous-embed-vision-1.0Text + image + video → single combined vector, 128K ctx
The full catalog is at GET /v1/models. Each model row includes a top-level kind (chat or embedding) and an aurous_metadata extension with the context window, supported dimensions, capability flags, and per-modality credit rates.

What works without changes

Chat completions (sync)

const completion = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  max_tokens: 64,
});

console.log(completion.choices[0].message.content);
console.log(completion.usage); // includes prompt_tokens, completion_tokens, total_tokens, credits_charged
The response shape is identical to OpenAI’s. The one Aurous extension is usage.credits_charged (a number) and usage.breakdown (per-token math for cost reconciliation). If your code only reads usage.total_tokens, you don’t see the extension; if your code logs the whole usage object, you get the extra fields as ignorable extras.

Streaming chat completions

const stream = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [{ role: "user", content: "Write a haiku about latency." }],
  max_tokens: 256,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
We emit SSE frames in OpenAI’s exact shape (data: {<chunk-json>}\n\n) terminated by data: [DONE]\n\n. The final non-[DONE] chunk carries the usage block with credits_charged.

Tool calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["city"],
            },
        },
    },
]

res = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)
print(res.choices[0].message.tool_calls)
tool_choice accepts the same shapes OpenAI does: "none", "auto", "required", and { "type": "function", "function": { "name": "..." } }.

Structured output (JSON schema)

const completion = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    { role: "user", content: "Summarize this in JSON: 'Aurous Labs is a B2B API platform.'" },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "summary",
      strict: true,
      schema: {
        type: "object",
        properties: {
          one_line: { type: "string" },
          tags: { type: "array", items: { type: "string" } },
        },
        required: ["one_line", "tags"],
        additionalProperties: false,
      },
    },
  },
});

const parsed = JSON.parse(completion.choices[0].message.content!);
We also accept response_format: { type: "json_object" } (looser; just guarantees parseable JSON). See Structured output for the schema constraints we enforce.

Embeddings (text)

res = client.embeddings.create(
    model="aurous-embed-vision-1.0",
    input="The quick brown fox jumps over the lazy dog.",
)
print(len(res.data[0].embedding))  # 2048 native

Models listing

const models = await client.models.list();
for (const m of models.data) {
  console.log(m.id, (m as any).kind, (m as any).aurous_metadata);
}
Every Aurous model row has a top-level kind (chat or embedding) and an aurous_metadata extension. The OpenAI TypeScript types don’t know about those fields, so cast the row to any or extend the type yourself if you want compile-time access.

What needs an extra line

Multimodal embedding input

OpenAI’s embeddings.create accepts only string or string[] for input. To send multimodal content parts (text + image + video), use the SDK’s lower-level client.post() escape hatch:
const res = await client.post("/embeddings", {
  body: {
    model: "aurous-embed-vision-1.0",
    input: [
      { type: "text", text: "Photo of a golden retriever in a park" },
      { type: "image_url", image_url: { url: "https://example.com/dog.jpg" } },
    ],
  },
}) as { data: { embedding: number[] }[]; usage: Record<string, unknown> };

console.log(res.data[0].embedding.length); // 2048
res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "image_url", "image_url": {"url": "https://example.com/dog.jpg"}},
        ],
    },
    cast_to=dict,
)
See Multimodal embeddings for the full content-parts taxonomy and the per-modality rate math.

Batch embeddings (input: ["a", "b", "c"])

OpenAI accepts input: ["a", "b", "c"] as a batched 3-vector return. Aurous does NOT — the underlying multimodal embedding model concatenates the inputs into a single combined vector, which is the opposite of OpenAI’s N→N semantics. Calls with input: string[] are rejected with 400 embeddings_batch_not_supported. The OpenAI-shaped workaround is to loop client-side. See OpenAI batch incompat for the rationale and the recommended loop pattern.

Headers and rate limits

Aurous response headers are a superset of OpenAI’s. Everything OpenAI emits is preserved; we add:
  • Aurous-Request-Idreq_<ulid> for support
  • Aurous-Version — the API version applied to this response
  • X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset — standard per-team RPM bucket
  • X-RateLimit-TPM-Limit / X-RateLimit-TPM-Remaining / X-RateLimit-TPM-Reset — additional per-team tokens-per-minute bucket on chat + embedding routes
  • Aurous-Idempotent-Replayed: true — present only when the response is a replay of a prior call with the same Idempotency-Key
OpenAI SDKs do not parse our custom headers; you can read them off the raw response object via the SDK’s with_raw_response (Python) or withResponse (Node) helpers.

Framework integrations

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="aurous-grow-2.0-pro",
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

print(llm.invoke("In 1 short sentence: what is an embedding?").content)

LangChain (Node)

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "aurous-grow-2.0-pro",
  configuration: {
    baseURL: "https://api.aurous-labs.com/v1",
  },
  apiKey: process.env.AUROUS_API_KEY!,
});

const res = await llm.invoke("In 1 short sentence: what is an embedding?");
console.log(res.content);

LlamaIndex (Python)

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="aurous-grow-2.0-pro",
    api_base="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

print(llm.complete("In 1 short sentence: what is an embedding?").text)

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const aurous = createOpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!,
});

const { text } = await generateText({
  model: aurous("aurous-grow-2.0-pro"),
  prompt: "In 1 short sentence: what is an embedding?",
});

console.log(text);
The AI SDK’s streaming, tool-calling, and structured-output helpers (streamText, generateObject) work without modification.

When OpenAI compatibility ends

A few things are intentionally NOT compatible:
  • Image generation — Aurous uses its own POST /v1/images surface (LoRA-based generation, async polling, anonymous-read output URLs). OpenAI’s images.generate SDK call won’t work — and we wouldn’t want it to, because the surfaces are fundamentally different (sync DALL-E vs. async LoRA-pipeline).
  • Fine-tuning — Aurous fine-tunes are LoRAs, not full-weight fine-tunes. They’re registered against the image generation surface, not the chat surface. The OpenAI fine-tuning SDK won’t work.
  • Audio (speech, transcription) — not yet on the platform.
  • Assistants / Threads API — not yet on the platform. The chat completions surface is stateless; if you need threading, build it client-side.
If you depend on one of those surfaces, keep your OpenAI client around for that surface only and use the Aurous client for everything else.

Where to next?