Skip to main content
POST /v1/embeddings/estimate takes the same payload as POST /v1/embeddings and returns the projected token counts and credit charge — without calling the model and without billing your team. Use it to:
  • Show a per-item cost in your UI before a customer clicks “Embed”.
  • Project the total cost of a batch indexing job before kicking it off.
  • Budget-gate a workload: skip items the estimate exceeds your per-call cap.
  • Sanity-check rate-card changes against your own corpus.

Request shape

The body is identical to POST /v1/embeddings minus the two fields that are billing-related and meaningless when no charge is made:
FieldStatus on /embeddingsStatus on /embeddings/estimate
modelrequiredrequired
inputrequired (string or content-parts array)required
dimensionsoptionaloptional
encoding_formatoptionalomitted — not relevant for an estimate
useroptionalomitted — not stored on an estimate row
All the input validation that runs on POST /v1/embeddings runs here too: same DTO, same caps (16 parts max, 8 image_url max, 1M chars per text part, 128K-token context window). video_url parts are rejected with embeddings_video_unsupported on both endpoints as of 2026-05-24. A malformed payload returns the same typed error code an estimate would have generated on the live endpoint.

Quick start

curl -X POST https://api.aurous-labs.com/v1/embeddings/estimate \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": "A 500-token product description for a leather messenger bag."
  }'
The OpenAI SDKs don’t ship an estimate helper, so call the endpoint directly with fetch / requests / axios.

Response shape

{
  "estimated": true,
  "tokens": {
    "text":  500,
    "image": 0,
    "video": 0,
    "total": 500
  },
  "credits_estimated": 0.009375,
  "breakdown": {
    "input": { "text": 0.009375, "visual": 0, "video": 0 },
    "model": "aurous-embed-vision-1.0"
  }
}
Fields:
FieldNotes
estimatedLiteral true. Distinguishes the estimate envelope from a real embedding response (which uses object: "list").
tokens.textEstimated text-token count after tokenization.
tokens.imageEstimated visual-token contribution from image_url parts.
tokens.videoDeprecated 2026-05-24 — always 0. Retained on the response shape for one release cycle so existing SDKs that read it don’t break; video_url parts now return embeddings_video_unsupported.
tokens.totalSum of the three modalities — what would surface as usage.prompt_tokens on the live call.
credits_estimatedProjected total credits. Matches the per-modality math walked in Pricing.
breakdown.input.{text,visual,video}Per-modality credit decomposition, identical shape to usage.breakdown.input on the live call.
breakdown.modelEchoes the requested model slug, inside the breakdown block.

Multimodal estimate

The estimate endpoint accepts content-parts arrays the same way /v1/embeddings does:
curl -X POST https://api.aurous-labs.com/v1/embeddings/estimate \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": [
      { "type": "text", "text": "Product photo of a vintage leather messenger bag with brass buckles." },
      {
        "type": "image_url",
        "image_url": { "url": "https://assets.aurous-labs.com/example-images/messenger-bag.jpg" }
      }
    ]
  }'
The response carries the same envelope; the tokens.image and breakdown.input.visual values are populated, and credits_estimated reflects the multimodal total.

Estimates are an upper bound

Estimates are computed from a pre-fetch tokenization pass. The real charge on POST /v1/embeddings may differ by a few percent for URL-fetched media — the platform fetches and re-tokenizes images and videos at request time, and the provider’s actual token count can come back slightly above or below the pre-fetch estimate. Plain-string text input is exact. In practice:
  • Budgeting a batch — treat the sum of estimates as your worst case.
  • Per-request cost UI — render the estimate; show the actual credits_charged on success.
  • Reconciliation — compare your accumulated estimates vs the same period’s GET /v1/usage; small drift is expected and benign.

No row, no hold, no billing event

A successful estimate call:
  • Does not create an embedding row.
  • Does not place a hold on your balance.
  • Does not emit a billing event.
  • Counts only against your RPM rate-limit bucket. It does not consume from your TPM bucket.
A 4xx from the estimate endpoint behaves like any other validation 4xx — same typed error envelope, no row, no charge.

Errors

Estimates run all the same input checks as the live endpoint, so the same codes apply:
  • embeddings_batch_not_supported (400) — input is a string[] array. See Multimodal.
  • embeddings_input_too_many_items (400) — over the 16-part or 8-image cap. Split into multiple requests.
  • embeddings_video_unsupported (400) — any video_url part is rejected. Extract a representative frame in your pipeline and submit it as image_url (bills at the visual rate).
  • embeddings_input_too_large (400) — pre-fetch token estimate is over the context window. Trim input.
  • embeddings_unsupported_dimensions (400) — the requested dimensions is not supported. See Dimensions.
  • model_not_found (404) — unknown model slug. List models with GET /v1/models.
  • model_disabled (403) — model exists but admin has deactivated it.
  • model_wrong_kind (400) — you sent a chat model to the embeddings estimate endpoint.
Server errors (5xx) on the estimate endpoint are platform issues — retry with backoff. They do not consume rate-limit budget.
  • Overview — the live POST /v1/embeddings endpoint.
  • Pricing — the credit math the estimate is based on.
  • Multimodal — input shape for mixed text + image + video.
  • Errors — full error taxonomy.