Skip to main content
Code: provider_rate_limited HTTP status: 503 Type: rate_limit

When it fires

The upstream model provider returned a throttle response while serving your chat completion or embedding. Distinct from too_many_requests (which is the platform’s per-team RPM cap) and tpm_rate_limit_exceeded (per-team TPM cap) — this code means the upstream model itself is hot.

How to recover

Retry with exponential backoff. The Retry-After header is forwarded from upstream when available — respect it if set. If you’re hitting this repeatedly, consider spreading load or reaching out to support.