Skip to main content
POST /v1/chat/completions/{id}/cancel requests cancellation of an in-flight chat completion. Useful when the user navigates away from a generating answer, when an upstream timeout fires, or when a downstream system signals “stop.” This is a request, not a guarantee — if the completion has already reached a terminal state (completed, failed, cancelled), the call returns 409 chat_cancel_target_already_terminal. If the completion is past the point of no return (the model has already emitted the full response, but the platform hasn’t recorded the final state yet), best-effort cancellation may still bill the full output.

Request

curl -X POST https://api.aurous-labs.com/v1/chat/completions/cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM/cancel \
  -H "Authorization: Bearer $AUROUS_API_KEY"
No body. Returns 200 with the updated completion row (now in status: cancelled) on success.
{
  "object": "chat.completion",
  "id": "cmp_01HXMQ7Z3K8Y2ABCDEFGHJKM",
  "status": "cancelled",
  "model": "aurous-grow-2.0-pro",
  "created_at": "2026-05-20T10:00:00Z",
  "cancelled_at": "2026-05-20T10:00:08Z",
  "choices": [
    {
      "index": 0,
      "finish_reason": "cancelled",
      "message": {
        "role": "assistant",
        "content": "The first part of the response that was generated before cancel..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 23,
    "total_tokens": 70,
    "credits_charged": 0.0167,
    "breakdown": {
      "model": "aurous-grow-2.0-pro",
      "input": { "text": 47, "cached_input_tokens": 0 },
      "output": { "text": 23, "reasoning": 0 }
    }
  }
}

Billing semantics

A cancelled completion bills only the tokens already committed to output at the moment cancellation took effect. The credit hold reserved for the maximum potential cost is committed for credits_charged and the rest is released back to your available balance. The math is the same as a succeeded completion — just with fewer completion_tokens. Cancellation never bills zero (the prompt was already processed and is billed at the input rate); it never bills the full max_tokens worth of output either. For streamed completions, the platform commits whichever tokens were already sent to the SSE client before the cancel signal reached the worker. Race conditions can cause the final committed-token count to differ from what the client received by 1-2 tokens; we err on the side of NOT over-billing.

Effect on streamed clients

If you call cancel on an active streamed completion:
  • The SSE stream emits any in-flight frames already queued
  • A final data: { "object": "chat.completion.chunk", "choices": [{ "finish_reason": "cancelled" }] } frame is emitted
  • The data: [DONE]\n\n terminator follows
  • The TCP connection closes
The streamed client should treat finish_reason: "cancelled" the same as any other terminal finish_reason — pop the partial assistant message into the conversation, no error toast required.

Error modes

ConditionStatusCode
Target id doesn’t exist404chat_cancel_target_not_found
Target exists but in completed / failed / cancelled409chat_cancel_target_already_terminal
Target exists but in pending (model hasn’t started)409chat_cancel_target_not_cancellable
Target belongs to a different team than the calling API key404chat_cancel_target_not_found (disclosure-safe — we don’t leak existence)
The not_cancellable case is narrow: a completion is in pending for ~50ms while the platform queues it to the provider. Once the model starts generating (processing), cancel is supported. If you hit not_cancellable, retry the cancel after a short backoff (~200ms).

Use cases

User navigates away mid-generation

useEffect(() => {
  return () => {
    // Cleanup on unmount — cancel the in-flight completion
    if (currentCompletionId) {
      fetch(`https://api.aurous-labs.com/v1/chat/completions/${currentCompletionId}/cancel`, {
        method: "POST",
        headers: { "X-Api-Key": process.env.AUROUS_API_KEY! },
      });
    }
  };
}, [currentCompletionId]);

Upstream timeout fires before the stream completes

import time
from openai import OpenAI

client = OpenAI(base_url="https://api.aurous-labs.com/v1", api_key="al_live_...")

stream = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[{"role": "user", "content": "Write a 10,000-word essay on..."}],
    max_tokens=8192,
    stream=True,
)

start = time.time()
cmp_id = None
for chunk in stream:
    if cmp_id is None:
        cmp_id = chunk.id  # cmp_<ulid>
    if time.time() - start > 30:  # 30s budget
        # cancel — partial tokens billed, rest of the hold released
        client.post(f"/chat/completions/{cmp_id}/cancel", body=None, cast_to=dict)
        break

Server-Sent Events client disconnect (automatic)

If the SSE client closes the TCP connection (browser tab closes, network dies, etc.), the platform detects the disconnect and automatically cancels the in-flight completion as if POST /cancel had been called. The same partial-billing semantics apply; the completion row ends in status: cancelled with cancelled_reason: client_disconnect in the metadata.

Where to next?