POST /v1/embeddings/estimate takes the same payload as POST /v1/embeddings and returns the projected token counts and credit charge — without calling the model and without billing your team. Use it to:
- Show a per-item cost in your UI before a customer clicks “Embed”.
- Project the total cost of a batch indexing job before kicking it off.
- Budget-gate a workload: skip items the estimate exceeds your per-call cap.
- Sanity-check rate-card changes against your own corpus.
Request shape
The body is identical toPOST /v1/embeddings minus the two fields that are billing-related and meaningless when no charge is made:
| Field | Status on /embeddings | Status on /embeddings/estimate |
|---|---|---|
model | required | required |
input | required (string or content-parts array) | required |
dimensions | optional | optional |
encoding_format | optional | omitted — not relevant for an estimate |
user | optional | omitted — not stored on an estimate row |
POST /v1/embeddings runs here too: same DTO, same caps (16 parts max, 8 image_url max, 1M chars per text part, 128K-token context window). video_url parts are rejected with embeddings_video_unsupported on both endpoints as of 2026-05-24. A malformed payload returns the same typed error code an estimate would have generated on the live endpoint.
Quick start
fetch / requests / axios.
Response shape
| Field | Notes |
|---|---|
estimated | Literal true. Distinguishes the estimate envelope from a real embedding response (which uses object: "list"). |
tokens.text | Estimated text-token count after tokenization. |
tokens.image | Estimated visual-token contribution from image_url parts. |
tokens.video | Deprecated 2026-05-24 — always 0. Retained on the response shape for one release cycle so existing SDKs that read it don’t break; video_url parts now return embeddings_video_unsupported. |
tokens.total | Sum of the three modalities — what would surface as usage.prompt_tokens on the live call. |
credits_estimated | Projected total credits. Matches the per-modality math walked in Pricing. |
breakdown.input.{text,visual,video} | Per-modality credit decomposition, identical shape to usage.breakdown.input on the live call. |
breakdown.model | Echoes the requested model slug, inside the breakdown block. |
Multimodal estimate
The estimate endpoint accepts content-parts arrays the same way/v1/embeddings does:
tokens.image and breakdown.input.visual values are populated, and credits_estimated reflects the multimodal total.
Estimates are an upper bound
Estimates are computed from a pre-fetch tokenization pass. The real charge onPOST /v1/embeddings may differ by a few percent for URL-fetched media — the platform fetches and re-tokenizes images and videos at request time, and the provider’s actual token count can come back slightly above or below the pre-fetch estimate. Plain-string text input is exact.
In practice:
- Budgeting a batch — treat the sum of estimates as your worst case.
- Per-request cost UI — render the estimate; show the actual
credits_chargedon success. - Reconciliation — compare your accumulated estimates vs the same period’s
GET /v1/usage; small drift is expected and benign.
No row, no hold, no billing event
A successful estimate call:- Does not create an embedding row.
- Does not place a hold on your balance.
- Does not emit a billing event.
- Counts only against your RPM rate-limit bucket. It does not consume from your TPM bucket.
Errors
Estimates run all the same input checks as the live endpoint, so the same codes apply:embeddings_batch_not_supported(400) —inputis astring[]array. See Multimodal.embeddings_input_too_many_items(400) — over the 16-part or 8-image cap. Split into multiple requests.embeddings_video_unsupported(400) — anyvideo_urlpart is rejected. Extract a representative frame in your pipeline and submit it asimage_url(bills at the visual rate).embeddings_input_too_large(400) — pre-fetch token estimate is over the context window. Trim input.embeddings_unsupported_dimensions(400) — the requesteddimensionsis not supported. See Dimensions.model_not_found(404) — unknownmodelslug. List models withGET /v1/models.model_disabled(403) — model exists but admin has deactivated it.model_wrong_kind(400) — you sent a chat model to the embeddings estimate endpoint.
Related
- Overview — the live
POST /v1/embeddingsendpoint. - Pricing — the credit math the estimate is based on.
- Multimodal — input shape for mixed text + image + video.
- Errors — full error taxonomy.

