Create a chat completion
OpenAI-compatible chat completion. Supports streaming (stream: true returns text/event-stream SSE), function calling (tools / tool_choice), multimodal input (image_url / video_url content parts), reasoning_effort, and structured output (response_format). Pricing: credits are debited from team balance at completion, snapshotted to the per-model pricing version that was active when the credit hold was placed.
Idempotency-Key is honored on non-streamed requests only (24h replay window). Streamed requests echo Aurous-Idempotency-Status: ignored_streaming and emit a warning frame as the first SSE data line; use stream=false for at-most-once semantics.
Image and video rates are frozen per Aurous-Version: customers pinned to an older version see the rate card that was current when their version shipped, so price changes are never silently retroactive. Chat and embedding rates are NOT frozen per Aurous-Version — they track the most recently published rate version for each model, because LLM provider economics shift on weekly cadence. The receipt echoed on every chat / embedding response carries the exact rate that applied at dispatch time, so audit trails remain stable.
Authorizations
Your team API key (starts with al_live_).
Headers
Echoed on the response. ignored_streaming when stream=true (idempotency does not apply); otherwise reflects the interceptor decision.
Stripe-style idempotency key. Honored only when stream=false. Same key + same canonical-JSON body returns the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. Replay window is 24 hours.
Optional API version pin (YYYY-MM-DD). Defaults to your team's pinned version, or the system default 2026-05-15 for unauthenticated requests.
^\d{4}-\d{2}-\d{2}$"2026-05-15"
Body
Public model slug from GET /v1/models. Example: aurous-grow-2.0-pro.
128"aurous-grow-2.0-pro"
Conversation messages, ordered oldest → newest. 1-200 messages.
Maximum tokens to generate. Defaults to the model’s default_max_output_tokens. Requests with max_tokens > the model’s max_output_tokens_hard_cap return 400 max_tokens_exceeds_hard_cap.
x >= 10 <= x <= 20 <= x <= 1If true, response is text/event-stream (SSE) with OpenAI-shaped chunks. When streaming, Idempotency-Key is ignored and a warning frame is emitted in the first chunk; use stream=false for at-most-once semantics.
OpenAI-compatible stream options. When stream: true, include_usage: true (the default on Aurous Labs) causes the final non-[DONE] SSE frame to carry the usage block with token counts + credits_charged. Set include_usage: false to suppress (matches OpenAI behavior when the flag is omitted on their side). Ignored when stream: false.
Function-calling routing: auto | none | {type:"function", function:{name}}. required is NOT supported and returns 400 tool_choice_required_unsupported.
Structured output. {type:"text"|"json_object"|"json_schema", json_schema?:{...}}. For json_schema, the schema must be ≤10KB JSON-stringified and ≤5 levels deep, else 400 response_format_too_large / response_format_too_deep.
Reasoning depth for reasoning-capable models. Higher values increase latency + token cost (reasoning tokens billed separately at the model’s output rate).
minimal, low, medium, high Random seed for reproducibility (when supported by the model).
Up to 4 stop sequences (string or array of strings; each ≤200 chars).
-2 <= x <= 2-2 <= x <= 2Return log-probabilities for output tokens.
0 <= x <= 20Response
Chat completion. Content type depends on stream: application/json for stream:false (full response body); text/event-stream for stream:true (SSE).
Opaque chat-completion id.
"cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM"
chat.completion Unix timestamp (seconds).
1731948000
Aurous model slug used.
"aurous-grow-2.0-pro"
Always exactly 1 entry in v1.0.
Aurous extension: terminal state of the chat completion. completed is the normal happy path. in_progress means the request is still being processed (only possible when polling GET /:id during a streamed request). failed means the provider or platform errored — see error for the typed reason. cancelled means a cancel was requested or the client disconnected.
completed, in_progress, failed, cancelled "completed"
Aurous extension: present only when status is failed or cancelled (in addition to the main error response when applicable). Echoes the typed error envelope shape so customers can branch on error.code without parsing string messages.
{
"code": "chat_provider_unavailable",
"type": "server_error",
"message": "Provider request failed."
}
