POST /v1/chat/completions accepts an Idempotency-Key header (or Aurous-Idempotency-Key) on non-streamed requests. A successful key + body combination is cached server-side for 24 hours; any subsequent identical request returns the cached response with Aurous-Idempotent-Replayed: true. Mismatched bodies return 409 idempotency_key_in_use.
This is the same idempotency contract used across /v1/embeddings, /v1/images, and /v1/videos. See the global Idempotency page for the broad pattern; this page covers the chat-specific specifics.
When to use it
Use an idempotency key whenever a network-level retry could otherwise cause a duplicate charge:- Your client library auto-retries on
5xxor network timeouts - You’re calling from a background job that resumes after a worker restart
- You’re processing user-submitted content where the user might click “send” twice
- You’re inside a database transaction that might roll back and replay
Response headers
The canonical signal that a response is a replay isAurous-Idempotent-Replayed: true. On any FIRST successful idempotent call, the header is absent; on a replay of the same key + same body, the header is present and set to true.
Aurous-Idempotency-Status is informational and currently emitted only on the streaming path:
Aurous-Idempotency-Status: ignored_streaming— onstream: true+Idempotency-Key, the key is ignored (see Streaming exception).Aurous-Idempotency-Status: not_set— onstream: truewith no key.- (non-streaming responses do not currently emit this header — the presence/absence of
Aurous-Idempotent-Replayedis the canonical signal. Surfacingacceptedon non-streamed responses is on the v1.0.x roadmap.)
Conflict semantics
If you send the SAME idempotency key with a DIFFERENT body within the 24-hour window, the platform returns409 idempotency_key_in_use:
POST body — model, messages, tools, response_format, temperature, max_tokens, etc. A single-character change in the prompt is enough to mismatch.
Cross-route conflicts
Idempotency keys are scoped to your team, not to a single route. Using the same key onPOST /v1/chat/completions and then on POST /v1/embeddings is treated as a body mismatch (the routes have different bodies) and returns the same 409 idempotency_key_in_use error.
Mint a fresh key per logical operation, OR scope your keys with a route prefix (chat-<uuid>, embed-<uuid>) if you’re concerned about collisions in your own code.
Key format
- 1-256 printable-ASCII characters; we recommend UUIDs (v4 or v7) or other opaque identifiers
- Empty / whitespace-only keys are rejected with
400 invalid_request - Keys longer than 256 chars are rejected with
400 invalid_request
Idempotency-Key: my-job-2026-05-20-001 works fine. We just need it to be unique per logical operation.
Header aliases
Two header names are accepted (case-insensitive):Idempotency-Key: <value>— Stripe-style, recommendedAurous-Idempotent-Key: <value>— vendor-prefixed alias
Streaming exception
POST /v1/chat/completions with stream: true and an Idempotency-Key header is allowed, but the idempotency does not apply — the platform emits Aurous-Idempotency-Status: ignored_streaming and a warning frame as the first SSE data line:
stream: falsefor the chat completion you need to be at-most-once (the dominant integrators of this pattern are background jobs where the streaming UX is irrelevant — the worker just needs the final assistant message)- Client-side dedupe for streamed UI — track which
cmp_<id>you’ve already shown the user; if the same logical operation retries, suppress the second stream
replay_on_idempotency option for streamed responses in v1.1 — it would cache the full final response and replay it as a single non-streamed frame on the retry. If you have a concrete use case, tell us at the feedback link.
Window + storage
- Idempotency keys are cached for 24 hours after first use. After 24 hours the key is forgotten — sending the same key + same body at hour 25 mints a NEW completion and bills it.
- Caches are scoped per-team. Two different teams can use the same opaque key value without conflict.
- The cache stores the full response body — including
cmp_<id>, the chat content, and the usage block — so the replay is semantically equivalent to the original. Field ordering may differ between the original and the replay (the replay reconstructs the JSON from the stored cache, not from the original serialization), but every value is byte-identical and clients that parse JSON (which is everyone) are unaffected.
Combining idempotency with retries
The recommended retry pattern for chat completions:- Mint a UUID per logical operation
- Use it on every retry of that operation
- Retry on
5xxand network errors, but NOT on4xx(a 4xx means the request is malformed — retrying won’t help) - Exponential backoff with jitter — start at 1s, double up to 32s
- Cap retries at 5 (chat completions take ~1-15s; 5 retries over ~60s is generous)
idempotencyKey once and letting the SDK retry is the simplest pattern.
Where to next?
- Idempotency (concept) — the global idempotency pattern across all writes
- Chat overview — the full chat surface
- Chat streaming — SSE details and the streaming exception
POST /v1/chat/completions— the endpoint reference

