stream: true on POST /v1/chat/completions and the response switches from application/json to text/event-stream. Tokens are emitted as they’re generated; the final chunk carries the usage block (including credits_charged) before the standard data: [DONE] terminator.
Frame format
Each frame is a singledata: line followed by a blank line:
: keep-alive is sent every 15 seconds if no content chunk has been emitted — SSE clients ignore comment lines, so you can rely on it to keep the connection warm without affecting parsing.
Headers on a streamed response
Content-Type: text/event-streamCache-Control: no-cache, no-transformAurous-Request-Id: req_<ulid>— quote in support tickets.Aurous-Version: YYYY-MM-DD— the contract version applied.Aurous-Idempotency-Status: ignored_streaming— if you sentIdempotency-Key. The header is recorded but the key is NOT stored; the first SSE frame will be a warning (see Idempotency on streamed requests).X-RateLimit-Limit/X-RateLimit-Remaining/X-RateLimit-Reset— RPM bucket.X-RateLimit-TPM-Limit/X-RateLimit-TPM-Remaining/X-RateLimit-TPM-Reset— TPM (tokens-per-minute) bucket.
Example: streamed completion
Cancel an in-flight stream
Two ways to stop a stream:-
Close the connection — tear down the TCP/HTTP connection on the client. The server detects the disconnect, waits up to 5 seconds for the upstream provider’s final usage chunk, commits actuals up to the abort point, releases the remainder of the hold, and flips the row to
cancelled_client_disconnect. -
Call the cancel endpoint —
POST /v1/chat/completions/{id}/cancel. Aborts the upstream connection, commits actuals from chunks already delivered, releases the remainder, flips the row tocancelled_by_request, and returns the final-state record:
| Code | HTTP | When |
|---|---|---|
chat_cancel_target_not_found | 404 | id doesn’t exist for your team. |
chat_cancel_target_already_terminal | 409 | Already completed, failed, or cancelled. Idempotency hint, not a bug. |
chat_cancel_target_not_cancellable | 409 | Record exists and isn’t terminal but the stream can’t be aborted (e.g. a sync call that already returned). |
AbortController pattern (Node)
The OpenAI SDK accepts an AbortSignal, so you can wire the same controller into both the request and a “Stop” button in your UI:
/cancel afterward is optional — useful when you want to confirm the final-state record before clearing it from your UI.
AbortController pattern (Python)
Idempotency on streamed requests
Streamed requests cannot be replayed deterministically, soIdempotency-Key is intentionally not stored when stream: true. The server still echoes the header value back via Aurous-Idempotency-Status: ignored_streaming, and emits a warning frame as the FIRST data line so SDK callers see it before any token chunks:
warning shape, but they won’t crash on it either — the field is ignored. If you need at-most-once delivery (typical for billing-sensitive integrations), set stream: false and pass Idempotency-Key as usual.
Client disconnects
If the client connection drops mid-stream (network blip, browser tab closed, server-side timeout):- The server keeps the upstream connection open for up to 5 seconds in case the provider’s final usage chunk arrives.
- If usage arrives within the grace window, actuals are committed exactly.
- If not, the server commits a chunk-count fallback estimate, logs the incident, and flips the row to
cancelled_client_disconnect.
GET /v1/chat/completions/{id} with its final committed usage.
Throughput & rate limits
Streamed chat requests draw from two buckets:- RPM (requests/minute) —
X-RateLimit-Remainingheaders. - TPM (tokens/minute) —
X-RateLimit-TPM-Remainingheaders. The bucket counts estimated tokens at request time; actuals adjust the bucket on commit.
429 with Retry-After — see Rate limits.
