Chat completions

POST /v1/chat/completions is the entry point for conversational and tool-driven LLM workloads. The surface is intentionally OpenAI-compatible — if your code already talks to chat.completions.create, you can point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.

baseURL: https://api.aurous-labs.com/v1
apiKey:  al_live_<your-hex-key>

Pass the key either as Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.

What’s supported

Streaming via stream: true (Server-Sent Events). See Streaming.
Function / tool calling via tools and tool_choice. See Tools.
Multimodal input — image and video parts inside messages[*].content. See Multimodal.
Reasoning effort for reasoning-capable models. See Reasoning.
Structured output via response_format: { type: "json_schema", ... } or { type: "json_object" }.
Idempotency via the Idempotency-Key header — on non-streamed requests only. Streamed requests echo Aurous-Idempotency-Status: ignored_streaming and emit a warning frame as the first SSE data line. Use stream: false for at-most-once semantics.

Models

List available models at GET /v1/models. The day-1 chat model is aurous-grow-2.0-pro — a multimodal, tool-capable, reasoning-capable model with a 256K context window. Capabilities, context window, default and hard-cap max_output_tokens, and credit rates are returned on the aurous_metadata extension of each model row.

Pricing

Chat is billed per token at credit rates surfaced on each model row at GET /v1/models (the chat_pricing block) and in the usage block of every chat completion response (including the final chunk of a streamed response). The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, version pinning, and the rate mutability rules.

Quick start

curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize the Linear product update in 3 bullets." }
    ],
    "max_tokens": 512
  }'

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!, // al_live_xxxxxxxxxxxxxxxx
});

const completion = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Summarize the Linear product update in 3 bullets." },
  ],
  max_tokens: 512,
});

console.log(completion.choices[0].message.content);
console.log(completion.usage); // includes credits_charged

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",  # or read from env
)

completion = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Summarize the Linear product update in 3 bullets."},
    ],
    max_tokens=512,
)

print(completion.choices[0].message.content)
print(completion.usage)  # includes credits_charged

Response shape

Non-streamed responses are standard OpenAI shape with an Aurous extension on usage:

{
  "id": "cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM",
  "object": "chat.completion",
  "created": 1731948000,
  "model": "aurous-grow-2.0-pro",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "...", "tool_calls": null },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 200,
    "completion_tokens": 600,
    "total_tokens": 800,
    "credits_charged": 0.285,
    "breakdown": {
      "input_credits": 0.015,
      "output_credits": 0.270,
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 7
    }
  }
}

The usage.credits_charged value is the exact amount deducted from your team’s balance. The breakdown.pricing_version is the rate-card version snapshot applied to this inference — pinned at request time even if admin updates rates afterward.

Idempotency

Pass Idempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make non-streamed POST /v1/chat/completions safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. Streamed requests ignore the header — see Streaming.

Cancellation

In-flight streamed requests can be aborted via POST /v1/chat/completions/{id}/cancel. Partial actuals (tokens delivered before the abort) are committed; the remainder of the held credits is released. See the cancel pattern in Streaming.

Errors

Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Chat-specific codes you might see:

model_not_found (404) — unknown model slug. Check the /v1/models listing.
model_disabled (403) — model exists but admin has deactivated it.
model_wrong_kind (400) — you sent an embedding model to the chat endpoint (or vice versa).
max_tokens_exceeds_hard_cap (400) — requested max_tokens is over the model’s hard cap. The model’s max_output_tokens_hard_cap is on the /v1/models row.
missing_max_tokens_no_model_default (400) — the model has no platform-side default, so you must pass max_tokens explicitly.
max_input_tokens_exceeded (400) — your prompt is over the model’s context window. Trim input or pick a larger model.
tpm_rate_limit_exceeded (429) — tokens-per-minute bucket is dry. Sleep Retry-After and retry.
provider_rate_limited (503) — upstream throttled. Retry-After echoed.
chat_provider_unavailable (502) — upstream transient failure. Retry with backoff.

Get started

Guides

Concepts

API Reference

Resources

What’s supported

Models

Pricing

Quick start

Response shape

Idempotency

Cancellation

Errors

​What’s supported

​Models

​Pricing

​Quick start

​Response shape

​Idempotency

​Cancellation

​Errors

What’s supported

Models

Pricing

Quick start

Response shape

Idempotency

Cancellation

Errors