Skip to main content
Code: tpm_rate_limit_exceeded HTTP status: 429 Type: rate_limit

When it fires

Chat completions and embeddings count against a tokens-per-minute (TPM) bucket in addition to the requests-per-minute (RPM) bucket. This code returns when the TPM bucket is dry. The X-RateLimit-TPM-Limit / X-RateLimit-TPM-Remaining / X-RateLimit-TPM-Reset headers report the bucket state on every response.

How to recover

Sleep for the Retry-After seconds reported on the response, then retry. For sustained throughput needs above the default tier, contact support@aurous-labs.com. See Rate limits for the full bucket model.