POST /v1/chat/completions is the entry point for conversational and tool-driven LLM workloads. The surface is intentionally OpenAI-compatible — if your code already talks to chat.completions.create, you can point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.
Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.
What’s supported
- Streaming via
stream: true(Server-Sent Events). See Streaming. - Function / tool calling via
toolsandtool_choice. See Tools. - Multimodal input — image and video parts inside
messages[*].content. See Multimodal. - Reasoning effort for reasoning-capable models. See Reasoning.
- Structured output via
response_format: { type: "json_schema", ... }or{ type: "json_object" }. - Idempotency via the
Idempotency-Keyheader — on non-streamed requests only. Streamed requests echoAurous-Idempotency-Status: ignored_streamingand emit a warning frame as the first SSE data line. Usestream: falsefor at-most-once semantics.
Models
List available models atGET /v1/models. The day-1 chat model is aurous-grow-2.0-pro — a multimodal, tool-capable, reasoning-capable model with a 256K context window. Capabilities, context window, default and hard-cap max_output_tokens, and credit rates are returned on the aurous_metadata extension of each model row.
Pricing
Chat is billed per token at credit rates surfaced on each model row atGET /v1/models (the chat_pricing block) and in the usage block of every chat completion response (including the final chunk of a streamed response). The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, version pinning, and the rate mutability rules.
Quick start
Response shape
Non-streamed responses are standard OpenAI shape with an Aurous extension onusage:
usage.credits_charged value is the exact amount deducted from your team’s balance. The breakdown.pricing_version is the rate-card version snapshot applied to this inference — pinned at request time even if admin updates rates afterward.
Idempotency
PassIdempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make non-streamed POST /v1/chat/completions safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. Streamed requests ignore the header — see Streaming.
Cancellation
In-flight streamed requests can be aborted viaPOST /v1/chat/completions/{id}/cancel. Partial actuals (tokens delivered before the abort) are committed; the remainder of the held credits is released. See the cancel pattern in Streaming.
Errors
Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Chat-specific codes you might see:model_not_found(404) — unknownmodelslug. Check the/v1/modelslisting.model_disabled(403) — model exists but admin has deactivated it.model_wrong_kind(400) — you sent an embedding model to the chat endpoint (or vice versa).max_tokens_exceeds_hard_cap(400) — requestedmax_tokensis over the model’s hard cap. The model’smax_output_tokens_hard_capis on the/v1/modelsrow.missing_max_tokens_no_model_default(400) — the model has no platform-side default, so you must passmax_tokensexplicitly.max_input_tokens_exceeded(400) — your prompt is over the model’s context window. Trim input or pick a larger model.tpm_rate_limit_exceeded(429) — tokens-per-minute bucket is dry. SleepRetry-Afterand retry.provider_rate_limited(503) — upstream throttled. Retry-After echoed.chat_provider_unavailable(502) — upstream transient failure. Retry with backoff.

