Skip to main content
Embeddings are billed in credits at per-token rates surfaced on each model row at GET /v1/models under the embedding_pricing block. The rate is the caller’s effective rate — any per-team override is already applied. Each response carries a usage block with the exact charge and a per-modality breakdown so you can attribute cost to inputs.
Changed 2026-05-24. The embedding_pricing.video rate was removed. Video input on POST /v1/embeddings is rejected with embeddings_video_unsupported. The provider folds video frames into the visual billing bucket, so the published video rate never actually fired — to keep the receipt honest we removed the shape. Embed visual content from videos by extracting frames in your pipeline and submitting them as image_url parts at the visual rate. See the changelog for migration notes.

What you pay for

Each embedding request is billed across two input buckets, depending on the modalities you sent:
BucketCountsRate field on /v1/models
Input — textprompt_tokens from text parts (or the plain-string input)embedding_pricing.text.credits_per_M
Input — visualprompt_tokens attributed to image_url partsembedding_pricing.visual.credits_per_M
Customers can compute credits-per-1K by dividing credits_per_M by 1000. There is no separate “output” bucket — embeddings return a vector, not generated tokens.

Day-1 rates for aurous-embed-vision-1.0

From the seeded pricing config (rates are mutable; the live values are always at /v1/models on the per-model embedding_pricing block):
ModalityRaw rate (USD per 1M input tokens)Credit rate (credits per 1M input tokens)
Text$0.125 / 1M18.75 credits / 1M
Visual (image)$0.325 / 1M48.75 credits / 1M
The credit rate is computed at request time from usd_per_M, the platform anchor usd_per_credit (1 credit = $0.01 by default), and a percentage markup. See How cost is computed below for the formula.

Worked examples

The examples below use the day-1 rates and the default anchor (usd_per_credit = $0.01, markup_pct = 50%). Numbers are exact; round to your billing precision.

Example 1 — text-only, 500-token document

{
  "model": "aurous-embed-vision-1.0",
  "input": "A 500-token product description..."
}
text tokens     = 500
visual tokens   = 0

text credits    = 500 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.009375 credits
visual credits  = 0

credits_charged = 0.009375 credits
Response:
{
  "usage": {
    "prompt_tokens": 500,
    "total_tokens": 500,
    "credits_charged": 0.009375,
    "breakdown": {
      "input": { "text": 0.009375, "visual": 0, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}
(breakdown.input.video is retained on the response shape at 0 for one release cycle so existing SDKs that read it don’t break — it will be dropped in a follow-up release.)

Example 2 — multimodal, text + 1 image (~1000 visual tokens)

{
  "model": "aurous-embed-vision-1.0",
  "input": [
    { "type": "text", "text": "A 1000-token product description..." },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/product.jpg" } }
  ]
}
text tokens     = 1000
visual tokens   = 1000

text credits    = 1000 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.01875  credits
visual credits  = 1000 / 1M × $0.325 ÷ $0.01 × 1.5 = 0.04875  credits

credits_charged = 0.06750 credits
Response:
{
  "usage": {
    "prompt_tokens": 2000,
    "total_tokens": 2000,
    "credits_charged": 0.067500,
    "breakdown": {
      "input": { "text": 0.018750, "visual": 0.048750, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}

Example 3 — multimodal, text + 2 images (~2000 visual tokens)

{
  "model": "aurous-embed-vision-1.0",
  "input": [
    { "type": "text", "text": "A 2000-token combined catalog entry..." },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/front.jpg" } },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/back.jpg" } }
  ]
}
text tokens     = 2000
visual tokens   = 2000

text credits    = 2000 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.03750  credits
visual credits  = 2000 / 1M × $0.325 ÷ $0.01 × 1.5 = 0.09750  credits

credits_charged = 0.13500 credits
Response:
{
  "usage": {
    "prompt_tokens": 4000,
    "total_tokens": 4000,
    "credits_charged": 0.135000,
    "breakdown": {
      "input": { "text": 0.037500, "visual": 0.097500, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}
A typical RAG-for-images workload — one product image plus a short description — lands somewhere between Example 1 and Example 2 (a few hundredths of a credit per item). At the default 1 credit = 0.01anchor,indexing100Kproductimageswithdescriptionssitsaround0.01 anchor, indexing 100K product images with descriptions sits around **67** before any volume discount.

The usage.breakdown.input block

Every embedding response carries this block:
"breakdown": {
  "input": { "text": <credits>, "visual": <credits>, "video": 0 },
  "model": "<model slug>"
}
  • text — credits attributed to the text portion of the input (always present, 0 when no text).
  • visual — credits attributed to image parts (always present, 0 when no images).
  • video — deprecated 2026-05-24, retained at 0 for one release cycle so existing SDKs that read it don’t break. Will be dropped in a follow-up release.
  • model — the model slug, echoing back the request to make audit trails self-contained.
The sum of the live modality keys (text + visual) equals credits_charged in the common case (see the next section for the rare overdraft path).

How cost is computed

The platform applies this formula per modality, then sums:
modality_credits = (tokens × USD_per_M) ÷ USD_per_credit × (1 + markup_pct / 100)
Walking Example 2’s text bucket end-to-end:
tokens          = 1000
USD_per_M       = 0.125     ← raw upstream rate (the platform converts to credits via the formula above; the resulting credits_per_M is what's surfaced on /v1/models.embedding_pricing.text)
USD_per_credit  = 0.01      ← platform anchor (or your team override if set)
markup_pct      = 50

raw_credits     = (1000 × 0.125) / 1,000,000 / 0.01    = 0.0125
marked_credits  = 0.0125 × (1 + 50 / 100)              = 0.01875 credits
The same formula runs for visual with its own rate. The final credits_charged is the sum across text + visual. credits_charged is the authoritative value — deduct it from your team balance and store it next to the row in your ledger. On rare overdraft-fallback paths (where the upstream model returned more tokens than the held credits could cover), the modality components in breakdown.input are scaled to reconcile back to credits_charged. Raw, un-scaled values are preserved in the platform’s audit trail and surfaced through GET /v1/usage for FinOps reconciliation.

Rate updates — the mutability story

Embedding rates are mutable: admin can update them at any time without a fresh Aurous-Version release — LLM rates are explicitly mutable in the contract. To detect a rate change, poll GET /v1/models and compare the per-model embedding_pricing.{text,visual}.credits_per_M against the value you last cached. The per-request charge is always deterministic against the rate card in force when the request was created — not against whatever the latest rates are at the moment you read the response.

Estimating cost before dispatch

Send the same payload to POST /v1/embeddings/estimate to preview the charge without billing. See Estimate for the endpoint shape, the upper-bound caveat, and SDK examples.

Refunds

OutcomeCharged for
Successful embeddingActuals as reported in usage.
Bad input (400 from the platform)0. No row written, no credits deducted.
Provider returned a 5xx0. The hold is released. Retry with backoff.
embeddings_input_too_large (pre-fetch token estimate exceeds context window)0. The DTO rejects before the provider is called.
embeddings_video_unsupported (any video_url part)0. Rejected at the DTO boundary; embed extracted frames as image_url instead.
The policy is: you pay for what was delivered. Embeddings are atomic — there is no partial-completion concept the way streamed chat has — so the row either lands with a charge or doesn’t land at all.

Idempotency and billing

POST /v1/embeddings accepts Idempotency-Key. Same key + same body within 24h replays the cached response — including the original credits_charged. You will NOT be double-charged for a retried key. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.

Where to read rates

  • GET /v1/models — per-model embedding_pricing.{text,visual}.credits_per_M rows (the caller’s effective rate, including any per-team override).
  • Estimate — preview cost for a specific payload.

Common questions

Are the per-modality rates the same? No. On aurous-embed-vision-1.0 the visual rate is higher than text per token, reflecting the cost difference. Plain-string input pays only the text rate. Can I see usage trends? Yes — GET /v1/usage aggregates credits by day/key/model with the same per-modality split. What happened to the video rate? Removed 2026-05-24. The provider folds video frames into the visual billing bucket — the published video rate never actually fired. Submitting a video_url part now returns embeddings_video_unsupported. Extract a representative frame in your pipeline and embed it as an image_url part at the visual rate.
  • Overview — embeddings surface and quick start.
  • Multimodal — the input shapes that drive the visual bucket.
  • Estimate — preview cost without charging.
  • Errors — the full error taxonomy.