Estimate Cost Before Embedding

POST /v1/embeddings/estimate takes the same payload as POST /v1/embeddings and returns the projected token counts and credit charge — without calling the model and without billing your team. Use it to:

Show a per-item cost in your UI before a customer clicks “Embed”.
Project the total cost of a batch indexing job before kicking it off.
Budget-gate a workload: skip items the estimate exceeds your per-call cap.
Sanity-check rate-card changes against your own corpus.

Request shape

The body is identical to POST /v1/embeddings minus the two fields that are billing-related and meaningless when no charge is made:

Field	Status on `/embeddings`	Status on `/embeddings/estimate`
`model`	required	required
`input`	required (string or content-parts array)	required
`dimensions`	optional	optional
`encoding_format`	optional	omitted — not relevant for an estimate
`user`	optional	omitted — not stored on an estimate row

All the input validation that runs on POST /v1/embeddings runs here too: same DTO, same caps (16 parts max, 8 image_url max, 1M chars per text part, 128K-token context window). video_url parts are rejected with embeddings_video_unsupported on both endpoints as of 2026-05-24. A malformed payload returns the same typed error code an estimate would have generated on the live endpoint.

Quick start

curl -X POST https://api.aurous-labs.com/v1/embeddings/estimate \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": "A 500-token product description for a leather messenger bag."
  }'

const res = await fetch("https://api.aurous-labs.com/v1/embeddings/estimate", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.AUROUS_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "aurous-embed-vision-1.0",
    input: "A 500-token product description for a leather messenger bag.",
  }),
});

const estimate = await res.json();
console.log("would charge:", estimate.credits_estimated, "credits");
console.log("breakdown:", estimate.breakdown.input);

import os, requests

res = requests.post(
    "https://api.aurous-labs.com/v1/embeddings/estimate",
    headers={
        "Authorization": f"Bearer {os.environ['AUROUS_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "aurous-embed-vision-1.0",
        "input": "A 500-token product description for a leather messenger bag.",
    },
)
estimate = res.json()
print("would charge:", estimate["credits_estimated"], "credits")
print("breakdown:", estimate["breakdown"]["input"])

The OpenAI SDKs don’t ship an estimate helper, so call the endpoint directly with fetch / requests / axios.

Response shape

{
  "estimated": true,
  "tokens": {
    "text":  500,
    "image": 0,
    "video": 0,
    "total": 500
  },
  "credits_estimated": 0.009375,
  "breakdown": {
    "input": { "text": 0.009375, "visual": 0, "video": 0 },
    "model": "aurous-embed-vision-1.0"
  }
}

Fields:

Field	Notes
`estimated`	Literal `true`. Distinguishes the estimate envelope from a real embedding response (which uses `object: "list"`).
`tokens.text`	Estimated text-token count after tokenization.
`tokens.image`	Estimated visual-token contribution from `image_url` parts.
`tokens.video`	Deprecated 2026-05-24 — always `0`. Retained on the response shape for one release cycle so existing SDKs that read it don’t break; `video_url` parts now return `embeddings_video_unsupported`.
`tokens.total`	Sum of the three modalities — what would surface as `usage.prompt_tokens` on the live call.
`credits_estimated`	Projected total credits. Matches the per-modality math walked in Pricing.
`breakdown.input.{text,visual,video}`	Per-modality credit decomposition, identical shape to `usage.breakdown.input` on the live call.
`breakdown.model`	Echoes the requested model slug, inside the breakdown block.

Multimodal estimate

The estimate endpoint accepts content-parts arrays the same way /v1/embeddings does:

curl -X POST https://api.aurous-labs.com/v1/embeddings/estimate \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": [
      { "type": "text", "text": "Product photo of a vintage leather messenger bag with brass buckles." },
      {
        "type": "image_url",
        "image_url": { "url": "https://assets.aurous-labs.com/example-images/messenger-bag.jpg" }
      }
    ]
  }'

The response carries the same envelope; the tokens.image and breakdown.input.visual values are populated, and credits_estimated reflects the multimodal total.

Estimates are an upper bound

Estimates are computed from a pre-fetch tokenization pass. The real charge on POST /v1/embeddings may differ by a few percent for URL-fetched media — the platform fetches and re-tokenizes images and videos at request time, and the provider’s actual token count can come back slightly above or below the pre-fetch estimate. Plain-string text input is exact. In practice:

Budgeting a batch — treat the sum of estimates as your worst case.
Per-request cost UI — render the estimate; show the actual credits_charged on success.
Reconciliation — compare your accumulated estimates vs the same period’s GET /v1/usage; small drift is expected and benign.

No row, no hold, no billing event

A successful estimate call:

Does not create an embedding row.
Does not place a hold on your balance.
Does not emit a billing event.
Counts only against your RPM rate-limit bucket. It does not consume from your TPM bucket.

A 4xx from the estimate endpoint behaves like any other validation 4xx — same typed error envelope, no row, no charge.

Errors

Estimates run all the same input checks as the live endpoint, so the same codes apply:

embeddings_batch_not_supported (400) — input is a string[] array. See Multimodal.
embeddings_input_too_many_items (400) — over the 16-part or 8-image cap. Split into multiple requests.
embeddings_video_unsupported (400) — any video_url part is rejected. Extract a representative frame in your pipeline and submit it as image_url (bills at the visual rate).
embeddings_input_too_large (400) — pre-fetch token estimate is over the context window. Trim input.
embeddings_unsupported_dimensions (400) — the requested dimensions is not supported. See Dimensions.
model_not_found (404) — unknown model slug. List models with GET /v1/models.
model_disabled (403) — model exists but admin has deactivated it.
model_wrong_kind (400) — you sent a chat model to the embeddings estimate endpoint.

Server errors (5xx) on the estimate endpoint are platform issues — retry with backoff. They do not consume rate-limit budget.

Overview — the live POST /v1/embeddings endpoint.
Pricing — the credit math the estimate is based on.
Multimodal — input shape for mixed text + image + video.
Errors — full error taxonomy.

Get started

Guides

Concepts

API Reference

Resources

Estimate Cost Before Embedding

Request shape

Quick start

Response shape

Multimodal estimate

Estimates are an upper bound

No row, no hold, no billing event

Errors

​Request shape

​Quick start

​Response shape

​Multimodal estimate

​Estimates are an upper bound

​No row, no hold, no billing event

​Errors

​Related

Request shape

Quick start

Response shape

Multimodal estimate

Estimates are an upper bound

No row, no hold, no billing event

Errors

Related