Skip to main content
POST /v1/chat/completions is the entry point for conversational and tool-driven LLM workloads. The surface is intentionally OpenAI-compatible — if your code already talks to chat.completions.create, you can point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.
baseURL: https://api.aurous-labs.com/v1
apiKey:  al_live_<your-hex-key>
Pass the key either as Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.

What’s supported

  • Streaming via stream: true (Server-Sent Events). See Streaming.
  • Function / tool calling via tools and tool_choice. See Tools.
  • Multimodal input — image and video parts inside messages[*].content. See Multimodal.
  • Reasoning effort for reasoning-capable models. See Reasoning.
  • Structured output via response_format: { type: "json_schema", ... } or { type: "json_object" }.
  • Idempotency via the Idempotency-Key header — on non-streamed requests only. Streamed requests echo Aurous-Idempotency-Status: ignored_streaming and emit a warning frame as the first SSE data line. Use stream: false for at-most-once semantics.

Models

List available models at GET /v1/models. The day-1 chat model is aurous-grow-2.0-pro — a multimodal, tool-capable, reasoning-capable model with a 256K context window. Capabilities, context window, default and hard-cap max_output_tokens, and credit rates are returned on the aurous_metadata extension of each model row.

Pricing

Chat is billed per token at credit rates surfaced on each model row at GET /v1/models (the chat_pricing block) and in the usage block of every chat completion response (including the final chunk of a streamed response). The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, version pinning, and the rate mutability rules.

Quick start

curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize the Linear product update in 3 bullets." }
    ],
    "max_tokens": 512
  }'

Response shape

Non-streamed responses are standard OpenAI shape with an Aurous extension on usage:
{
  "id": "cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM",
  "object": "chat.completion",
  "created": 1731948000,
  "model": "aurous-grow-2.0-pro",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "...", "tool_calls": null },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 200,
    "completion_tokens": 600,
    "total_tokens": 800,
    "credits_charged": 0.285,
    "breakdown": {
      "input_credits": 0.015,
      "output_credits": 0.270,
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 7
    }
  }
}
The usage.credits_charged value is the exact amount deducted from your team’s balance. The breakdown.pricing_version is the rate-card version snapshot applied to this inference — pinned at request time even if admin updates rates afterward.

Idempotency

Pass Idempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make non-streamed POST /v1/chat/completions safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. Streamed requests ignore the header — see Streaming.

Cancellation

In-flight streamed requests can be aborted via POST /v1/chat/completions/{id}/cancel. Partial actuals (tokens delivered before the abort) are committed; the remainder of the held credits is released. See the cancel pattern in Streaming.

Errors

Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Chat-specific codes you might see:
  • model_not_found (404) — unknown model slug. Check the /v1/models listing.
  • model_disabled (403) — model exists but admin has deactivated it.
  • model_wrong_kind (400) — you sent an embedding model to the chat endpoint (or vice versa).
  • max_tokens_exceeds_hard_cap (400) — requested max_tokens is over the model’s hard cap. The model’s max_output_tokens_hard_cap is on the /v1/models row.
  • missing_max_tokens_no_model_default (400) — the model has no platform-side default, so you must pass max_tokens explicitly.
  • max_input_tokens_exceeded (400) — your prompt is over the model’s context window. Trim input or pick a larger model.
  • tpm_rate_limit_exceeded (429) — tokens-per-minute bucket is dry. Sleep Retry-After and retry.
  • provider_rate_limited (503) — upstream throttled. Retry-After echoed.
  • chat_provider_unavailable (502) — upstream transient failure. Retry with backoff.