tpm_rate_limit_exceeded
HTTP status: 429
Type: rate_limit
When it fires
Chat completions and embeddings count against a tokens-per-minute (TPM) bucket in addition to the requests-per-minute (RPM) bucket. This code returns when the TPM bucket is dry. TheX-RateLimit-TPM-Limit / X-RateLimit-TPM-Remaining / X-RateLimit-TPM-Reset headers report the bucket state on every response.
How to recover
Sleep for theRetry-After seconds reported on the response, then retry. For sustained throughput needs above the default tier, contact support@aurous-labs.com. See Rate limits for the full bucket model.
