Skip to main content
POST /v1/chat/completions accepts an Idempotency-Key header (or Aurous-Idempotency-Key) on non-streamed requests. A successful key + body combination is cached server-side for 24 hours; any subsequent identical request returns the cached response with Aurous-Idempotent-Replayed: true. Mismatched bodies return 409 idempotency_key_in_use. This is the same idempotency contract used across /v1/embeddings, /v1/images, and /v1/videos. See the global Idempotency page for the broad pattern; this page covers the chat-specific specifics.

When to use it

Use an idempotency key whenever a network-level retry could otherwise cause a duplicate charge:
  • Your client library auto-retries on 5xx or network timeouts
  • You’re calling from a background job that resumes after a worker restart
  • You’re processing user-submitted content where the user might click “send” twice
  • You’re inside a database transaction that might roll back and replay
The pattern: mint a UUID (or any opaque value, max 64 chars) per logical operation, pass it on every retry, get the same response back without re-billing.
KEY=$(uuidgen)
curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $KEY" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [{ "role": "user", "content": "say hi" }],
    "max_tokens": 10
  }'

# Retry — returns the same response + Aurous-Idempotent-Replayed: true
curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $KEY" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [{ "role": "user", "content": "say hi" }],
    "max_tokens": 10
  }'
The replay returns instantly — no model call, no charge, no row mutation.

Response headers

The canonical signal that a response is a replay is Aurous-Idempotent-Replayed: true. On any FIRST successful idempotent call, the header is absent; on a replay of the same key + same body, the header is present and set to true. Aurous-Idempotency-Status is informational and currently emitted only on the streaming path:
  • Aurous-Idempotency-Status: ignored_streaming — on stream: true + Idempotency-Key, the key is ignored (see Streaming exception).
  • Aurous-Idempotency-Status: not_set — on stream: true with no key.
  • (non-streaming responses do not currently emit this header — the presence/absence of Aurous-Idempotent-Replayed is the canonical signal. Surfacing accepted on non-streamed responses is on the v1.0.x roadmap.)

Conflict semantics

If you send the SAME idempotency key with a DIFFERENT body within the 24-hour window, the platform returns 409 idempotency_key_in_use:
{
  "error": {
    "type": "invalid_request",
    "code": "idempotency_key_in_use",
    "message": "Idempotency-Key 'xyz...' was used 12 minutes ago with a different request body. Pass a unique key per logical operation, or pass the same body to replay the cached response.",
    "doc_url": "https://docs.aurous-labs.com/errors#idempotency_key_in_use",
    "request_id": "req_..."
  }
}
The body fingerprint covers the entire POST body — model, messages, tools, response_format, temperature, max_tokens, etc. A single-character change in the prompt is enough to mismatch.

Cross-route conflicts

Idempotency keys are scoped to your team, not to a single route. Using the same key on POST /v1/chat/completions and then on POST /v1/embeddings is treated as a body mismatch (the routes have different bodies) and returns the same 409 idempotency_key_in_use error. Mint a fresh key per logical operation, OR scope your keys with a route prefix (chat-<uuid>, embed-<uuid>) if you’re concerned about collisions in your own code.

Key format

  • 1-256 printable-ASCII characters; we recommend UUIDs (v4 or v7) or other opaque identifiers
  • Empty / whitespace-only keys are rejected with 400 invalid_request
  • Keys longer than 256 chars are rejected with 400 invalid_request
We don’t enforce a specific format — Idempotency-Key: my-job-2026-05-20-001 works fine. We just need it to be unique per logical operation.

Header aliases

Two header names are accepted (case-insensitive):
  • Idempotency-Key: <value> — Stripe-style, recommended
  • Aurous-Idempotent-Key: <value> — vendor-prefixed alias
Send either one; we treat them as equivalent. If you send both, the platform uses the first one in the request order.

Streaming exception

POST /v1/chat/completions with stream: true and an Idempotency-Key header is allowed, but the idempotency does not apply — the platform emits Aurous-Idempotency-Status: ignored_streaming and a warning frame as the first SSE data line:
data: { "warning": { "code": "idempotency_key_ignored_on_streaming", "message": "Idempotency-Key was provided but ignored on a streamed request. Use stream: false for at-most-once semantics." } }

data: { "object": "chat.completion.chunk", ... }
...
data: [DONE]
The reason: a streamed response is a multi-frame transport that can be partially consumed, partially discarded by the client, or interrupted mid-flight. Replaying a partial stream from cache would either re-emit frames the client already saw (incorrect playback) or restart from frame 1 (different semantics from a fresh call). Neither is sound. The two valid patterns for at-most-once streaming:
  1. stream: false for the chat completion you need to be at-most-once (the dominant integrators of this pattern are background jobs where the streaming UX is irrelevant — the worker just needs the final assistant message)
  2. Client-side dedupe for streamed UI — track which cmp_<id> you’ve already shown the user; if the same logical operation retries, suppress the second stream
We may add a replay_on_idempotency option for streamed responses in v1.1 — it would cache the full final response and replay it as a single non-streamed frame on the retry. If you have a concrete use case, tell us at the feedback link.

Window + storage

  • Idempotency keys are cached for 24 hours after first use. After 24 hours the key is forgotten — sending the same key + same body at hour 25 mints a NEW completion and bills it.
  • Caches are scoped per-team. Two different teams can use the same opaque key value without conflict.
  • The cache stores the full response body — including cmp_<id>, the chat content, and the usage block — so the replay is semantically equivalent to the original. Field ordering may differ between the original and the replay (the replay reconstructs the JSON from the stored cache, not from the original serialization), but every value is byte-identical and clients that parse JSON (which is everyone) are unaffected.

Combining idempotency with retries

The recommended retry pattern for chat completions:
  1. Mint a UUID per logical operation
  2. Use it on every retry of that operation
  3. Retry on 5xx and network errors, but NOT on 4xx (a 4xx means the request is malformed — retrying won’t help)
  4. Exponential backoff with jitter — start at 1s, double up to 32s
  5. Cap retries at 5 (chat completions take ~1-15s; 5 retries over ~60s is generous)
The OpenAI SDK’s built-in retry logic (in both Node and Python) honors any header you pass, so setting idempotencyKey once and letting the SDK retry is the simplest pattern.

Where to next?