Skip to main content
Set stream: true on POST /v1/chat/completions and the response switches from application/json to text/event-stream. Tokens are emitted as they’re generated; the final chunk carries the usage block (including credits_charged) before the standard data: [DONE] terminator.

Frame format

Each frame is a single data: line followed by a blank line:
data: {"id":"cmp_...","object":"chat.completion.chunk","created":1731948000,"model":"aurous-grow-2.0-pro","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"cmp_...","object":"chat.completion.chunk","created":1731948000,"model":"aurous-grow-2.0-pro","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"cmp_...","object":"chat.completion.chunk","created":1731948000,"model":"aurous-grow-2.0-pro","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"cmp_...","object":"chat.completion.chunk","created":1731948000,"model":"aurous-grow-2.0-pro","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_charged":0.0117,"breakdown":{"input_credits":0.0009,"output_credits":0.0108,"model":"aurous-grow-2.0-pro","pricing_version":7}}}

data: [DONE]
A keep-alive comment frame : keep-alive is sent every 15 seconds if no content chunk has been emitted — SSE clients ignore comment lines, so you can rely on it to keep the connection warm without affecting parsing.

Headers on a streamed response

  • Content-Type: text/event-stream
  • Cache-Control: no-cache, no-transform
  • Aurous-Request-Id: req_<ulid> — quote in support tickets.
  • Aurous-Version: YYYY-MM-DD — the contract version applied.
  • Aurous-Idempotency-Status: ignored_streaming — if you sent Idempotency-Key. The header is recorded but the key is NOT stored; the first SSE frame will be a warning (see Idempotency on streamed requests).
  • X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset — RPM bucket.
  • X-RateLimit-TPM-Limit / X-RateLimit-TPM-Remaining / X-RateLimit-TPM-Reset — TPM (tokens-per-minute) bucket.

Example: streamed completion

curl -N -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [{ "role": "user", "content": "Write a haiku about latency." }],
    "stream": true,
    "max_tokens": 256
  }'

Cancel an in-flight stream

Two ways to stop a stream:
  1. Close the connection — tear down the TCP/HTTP connection on the client. The server detects the disconnect, waits up to 5 seconds for the upstream provider’s final usage chunk, commits actuals up to the abort point, releases the remainder of the hold, and flips the row to cancelled_client_disconnect.
  2. Call the cancel endpointPOST /v1/chat/completions/{id}/cancel. Aborts the upstream connection, commits actuals from chunks already delivered, releases the remainder, flips the row to cancelled_by_request, and returns the final-state record:
{
  "id": "cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM",
  "object": "chat.completion",
  "status": "cancelled_by_request",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20,
    "credits_charged": 0.0069
  }
}
The cancel endpoint distinguishes three terminal cases — useful for retry decisions:
CodeHTTPWhen
chat_cancel_target_not_found404id doesn’t exist for your team.
chat_cancel_target_already_terminal409Already completed, failed, or cancelled. Idempotency hint, not a bug.
chat_cancel_target_not_cancellable409Record exists and isn’t terminal but the stream can’t be aborted (e.g. a sync call that already returned).

AbortController pattern (Node)

The OpenAI SDK accepts an AbortSignal, so you can wire the same controller into both the request and a “Stop” button in your UI:
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!,
});

const controller = new AbortController();
// Stop button handler: controller.abort();

try {
  const stream = await client.chat.completions.create(
    {
      model: "aurous-grow-2.0-pro",
      messages: [{ role: "user", content: "Long answer please" }],
      stream: true,
      max_tokens: 4096,
    },
    { signal: controller.signal },
  );

  let completionId: string | undefined;
  for await (const chunk of stream) {
    completionId ??= chunk.id;
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) process.stdout.write(delta);
  }
} catch (err) {
  if (controller.signal.aborted && completionId) {
    // Optional: confirm cancellation server-side and commit actuals.
    await fetch(
      `https://api.aurous-labs.com/v1/chat/completions/${completionId}/cancel`,
      {
        method: "POST",
        headers: { Authorization: `Bearer ${process.env.AUROUS_API_KEY}` },
      },
    );
  } else {
    throw err;
  }
}
Aborting via the controller alone tears down the connection, which is enough to commit partial actuals. Calling /cancel afterward is optional — useful when you want to confirm the final-state record before clearing it from your UI.

AbortController pattern (Python)

import os
import threading
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key=os.environ["AUROUS_API_KEY"],
)

# A simple "stop" mechanism shared between threads.
stop_event = threading.Event()

stream = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[{"role": "user", "content": "Long answer please"}],
    stream=True,
    max_tokens=4096,
)

completion_id = None
for chunk in stream:
    completion_id = completion_id or chunk.id
    if stop_event.is_set():
        stream.close()  # closes the underlying HTTP connection
        break
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Idempotency on streamed requests

Streamed requests cannot be replayed deterministically, so Idempotency-Key is intentionally not stored when stream: true. The server still echoes the header value back via Aurous-Idempotency-Status: ignored_streaming, and emits a warning frame as the FIRST data line so SDK callers see it before any token chunks:
data: {"warning":{"code":"idempotency_key_ignored_on_streaming","message":"Idempotency-Key headers are ignored on streamed chat requests. Use stream=false for at-most-once semantics."}}
OpenAI-compatible clients won’t recognize the warning shape, but they won’t crash on it either — the field is ignored. If you need at-most-once delivery (typical for billing-sensitive integrations), set stream: false and pass Idempotency-Key as usual.

Client disconnects

If the client connection drops mid-stream (network blip, browser tab closed, server-side timeout):
  • The server keeps the upstream connection open for up to 5 seconds in case the provider’s final usage chunk arrives.
  • If usage arrives within the grace window, actuals are committed exactly.
  • If not, the server commits a chunk-count fallback estimate, logs the incident, and flips the row to cancelled_client_disconnect.
The row remains retrievable via GET /v1/chat/completions/{id} with its final committed usage.

Throughput & rate limits

Streamed chat requests draw from two buckets:
  • RPM (requests/minute) — X-RateLimit-Remaining headers.
  • TPM (tokens/minute) — X-RateLimit-TPM-Remaining headers. The bucket counts estimated tokens at request time; actuals adjust the bucket on commit.
Hitting either bucket returns 429 with Retry-After — see Rate limits.