POST /v1/chat/completions/{id}/cancel requests cancellation of an in-flight chat completion. Useful when the user navigates away from a generating answer, when an upstream timeout fires, or when a downstream system signals “stop.”
This is a request, not a guarantee — if the completion has already reached a terminal state (completed, failed, cancelled), the call returns 409 chat_cancel_target_already_terminal. If the completion is past the point of no return (the model has already emitted the full response, but the platform hasn’t recorded the final state yet), best-effort cancellation may still bill the full output.
Request
200 with the updated completion row (now in status: cancelled) on success.
Billing semantics
A cancelled completion bills only the tokens already committed to output at the moment cancellation took effect. The credit hold reserved for the maximum potential cost is committed forcredits_charged and the rest is released back to your available balance.
The math is the same as a succeeded completion — just with fewer completion_tokens. Cancellation never bills zero (the prompt was already processed and is billed at the input rate); it never bills the full max_tokens worth of output either.
For streamed completions, the platform commits whichever tokens were already sent to the SSE client before the cancel signal reached the worker. Race conditions can cause the final committed-token count to differ from what the client received by 1-2 tokens; we err on the side of NOT over-billing.
Effect on streamed clients
If you callcancel on an active streamed completion:
- The SSE stream emits any in-flight frames already queued
- A final
data: { "object": "chat.completion.chunk", "choices": [{ "finish_reason": "cancelled" }] }frame is emitted - The
data: [DONE]\n\nterminator follows - The TCP connection closes
finish_reason: "cancelled" the same as any other terminal finish_reason — pop the partial assistant message into the conversation, no error toast required.
Error modes
| Condition | Status | Code |
|---|---|---|
| Target id doesn’t exist | 404 | chat_cancel_target_not_found |
Target exists but in completed / failed / cancelled | 409 | chat_cancel_target_already_terminal |
Target exists but in pending (model hasn’t started) | 409 | chat_cancel_target_not_cancellable |
| Target belongs to a different team than the calling API key | 404 | chat_cancel_target_not_found (disclosure-safe — we don’t leak existence) |
not_cancellable case is narrow: a completion is in pending for ~50ms while the platform queues it to the provider. Once the model starts generating (processing), cancel is supported. If you hit not_cancellable, retry the cancel after a short backoff (~200ms).
Use cases
User navigates away mid-generation
Upstream timeout fires before the stream completes
Server-Sent Events client disconnect (automatic)
If the SSE client closes the TCP connection (browser tab closes, network dies, etc.), the platform detects the disconnect and automatically cancels the in-flight completion as ifPOST /cancel had been called. The same partial-billing semantics apply; the completion row ends in status: cancelled with cancelled_reason: client_disconnect in the metadata.
Where to next?
- Chat overview — the full chat surface
- Chat streaming — SSE behavior
POST /v1/chat/completions/{id}/cancel— endpoint reference

