Pricing

Embeddings are billed in credits at per-token rates surfaced on each model row at GET /v1/models under the embedding_pricing block. The rate is the caller’s effective rate — any per-team override is already applied. Each response carries a usage block with the exact charge and a per-modality breakdown so you can attribute cost to inputs.

Changed 2026-05-24. The embedding_pricing.video rate was removed. Video input on POST /v1/embeddings is rejected with embeddings_video_unsupported. The provider folds video frames into the visual billing bucket, so the published video rate never actually fired — to keep the receipt honest we removed the shape. Embed visual content from videos by extracting frames in your pipeline and submitting them as image_url parts at the visual rate. See the changelog for migration notes.

What you pay for

Each embedding request is billed across two input buckets, depending on the modalities you sent:

Bucket	Counts	Rate field on `/v1/models`
Input — text	`prompt_tokens` from `text` parts (or the plain-string `input`)	`embedding_pricing.text.credits_per_M`
Input — visual	`prompt_tokens` attributed to `image_url` parts	`embedding_pricing.visual.credits_per_M`

Customers can compute credits-per-1K by dividing credits_per_M by 1000. There is no separate “output” bucket — embeddings return a vector, not generated tokens.

Day-1 rates for `aurous-embed-vision-1.0`

From the seeded pricing config (rates are mutable; the live values are always at /v1/models on the per-model embedding_pricing block):

Modality	Raw rate (USD per 1M input tokens)	Credit rate (credits per 1M input tokens)
Text	$0.125 / 1M	18.75 credits / 1M
Visual (image)	$0.325 / 1M	48.75 credits / 1M

The credit rate is computed at request time from usd_per_M, the platform anchor usd_per_credit (1 credit = $0.01 by default), and a percentage markup. See How cost is computed below for the formula.

Worked examples

The examples below use the day-1 rates and the default anchor (usd_per_credit = $0.01, markup_pct = 50%). Numbers are exact; round to your billing precision.

Example 1 — text-only, 500-token document

{
  "model": "aurous-embed-vision-1.0",
  "input": "A 500-token product description..."
}

text tokens     = 500
visual tokens   = 0

text credits    = 500 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.009375 credits
visual credits  = 0

credits_charged = 0.009375 credits

Response:

{
  "usage": {
    "prompt_tokens": 500,
    "total_tokens": 500,
    "credits_charged": 0.009375,
    "breakdown": {
      "input": { "text": 0.009375, "visual": 0, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}

(breakdown.input.video is retained on the response shape at 0 for one release cycle so existing SDKs that read it don’t break — it will be dropped in a follow-up release.)

Example 2 — multimodal, text + 1 image (~1000 visual tokens)

{
  "model": "aurous-embed-vision-1.0",
  "input": [
    { "type": "text", "text": "A 1000-token product description..." },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/product.jpg" } }
  ]
}

text tokens     = 1000
visual tokens   = 1000

text credits    = 1000 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.01875  credits
visual credits  = 1000 / 1M × $0.325 ÷ $0.01 × 1.5 = 0.04875  credits

credits_charged = 0.06750 credits

Response:

{
  "usage": {
    "prompt_tokens": 2000,
    "total_tokens": 2000,
    "credits_charged": 0.067500,
    "breakdown": {
      "input": { "text": 0.018750, "visual": 0.048750, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}

Example 3 — multimodal, text + 2 images (~2000 visual tokens)

{
  "model": "aurous-embed-vision-1.0",
  "input": [
    { "type": "text", "text": "A 2000-token combined catalog entry..." },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/front.jpg" } },
    { "type": "image_url", "image_url": { "url": "https://assets.aurous-labs.com/example-images/back.jpg" } }
  ]
}

text tokens     = 2000
visual tokens   = 2000

text credits    = 2000 / 1M × $0.125 ÷ $0.01 × 1.5 = 0.03750  credits
visual credits  = 2000 / 1M × $0.325 ÷ $0.01 × 1.5 = 0.09750  credits

credits_charged = 0.13500 credits

Response:

{
  "usage": {
    "prompt_tokens": 4000,
    "total_tokens": 4000,
    "credits_charged": 0.135000,
    "breakdown": {
      "input": { "text": 0.037500, "visual": 0.097500, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}

A typical RAG-for-images workload — one product image plus a short description — lands somewhere between Example 1 and Example 2 (a few hundredths of a credit per item). At the default 1 credit =

0.01 anchor, indexing 100K product images with descriptions sits around **

67** before any volume discount.

The `usage.breakdown.input` block

Every embedding response carries this block:

"breakdown": {
  "input": { "text": <credits>, "visual": <credits>, "video": 0 },
  "model": "<model slug>"
}

text — credits attributed to the text portion of the input (always present, 0 when no text).
visual — credits attributed to image parts (always present, 0 when no images).
video — deprecated 2026-05-24, retained at 0 for one release cycle so existing SDKs that read it don’t break. Will be dropped in a follow-up release.
model — the model slug, echoing back the request to make audit trails self-contained.

The sum of the live modality keys (text + visual) equals credits_charged in the common case (see the next section for the rare overdraft path).

How cost is computed

The platform applies this formula per modality, then sums:

modality_credits = (tokens × USD_per_M) ÷ USD_per_credit × (1 + markup_pct / 100)

Walking Example 2’s text bucket end-to-end:

tokens          = 1000
USD_per_M       = 0.125     ← raw upstream rate (the platform converts to credits via the formula above; the resulting credits_per_M is what's surfaced on /v1/models.embedding_pricing.text)
USD_per_credit  = 0.01      ← platform anchor (or your team override if set)
markup_pct      = 50

raw_credits     = (1000 × 0.125) / 1,000,000 / 0.01    = 0.0125
marked_credits  = 0.0125 × (1 + 50 / 100)              = 0.01875 credits

The same formula runs for visual with its own rate. The final credits_charged is the sum across text + visual. credits_charged is the authoritative value — deduct it from your team balance and store it next to the row in your ledger. On rare overdraft-fallback paths (where the upstream model returned more tokens than the held credits could cover), the modality components in breakdown.input are scaled to reconcile back to credits_charged. Raw, un-scaled values are preserved in the platform’s audit trail and surfaced through GET /v1/usage for FinOps reconciliation.

Rate updates — the mutability story

Embedding rates are mutable: admin can update them at any time without a fresh Aurous-Version release — LLM rates are explicitly mutable in the contract. To detect a rate change, poll GET /v1/models and compare the per-model embedding_pricing.{text,visual}.credits_per_M against the value you last cached. The per-request charge is always deterministic against the rate card in force when the request was created — not against whatever the latest rates are at the moment you read the response.

Estimating cost before dispatch

Send the same payload to POST /v1/embeddings/estimate to preview the charge without billing. See Estimate for the endpoint shape, the upper-bound caveat, and SDK examples.

Refunds

Outcome	Charged for
Successful embedding	Actuals as reported in `usage`.
Bad input (400 from the platform)	0. No row written, no credits deducted.
Provider returned a 5xx	0. The hold is released. Retry with backoff.
`embeddings_input_too_large` (pre-fetch token estimate exceeds context window)	0. The DTO rejects before the provider is called.
`embeddings_video_unsupported` (any `video_url` part)	0. Rejected at the DTO boundary; embed extracted frames as `image_url` instead.

The policy is: you pay for what was delivered. Embeddings are atomic — there is no partial-completion concept the way streamed chat has — so the row either lands with a charge or doesn’t land at all.

Idempotency and billing

POST /v1/embeddings accepts Idempotency-Key. Same key + same body within 24h replays the cached response — including the original credits_charged. You will NOT be double-charged for a retried key. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.

Where to read rates

GET /v1/models — per-model embedding_pricing.{text,visual}.credits_per_M rows (the caller’s effective rate, including any per-team override).
Estimate — preview cost for a specific payload.

Common questions

Are the per-modality rates the same? No. On aurous-embed-vision-1.0 the visual rate is higher than text per token, reflecting the cost difference. Plain-string input pays only the text rate. Can I see usage trends? Yes — GET /v1/usage aggregates credits by day/key/model with the same per-modality split. What happened to the video rate? Removed 2026-05-24. The provider folds video frames into the visual billing bucket — the published video rate never actually fired. Submitting a video_url part now returns embeddings_video_unsupported. Extract a representative frame in your pipeline and embed it as an image_url part at the visual rate.

Overview — embeddings surface and quick start.
Multimodal — the input shapes that drive the visual bucket.
Estimate — preview cost without charging.
Errors — the full error taxonomy.

Get started

Guides

Concepts

API Reference

Resources

What you pay for

Day-1 rates for `aurous-embed-vision-1.0`

Worked examples

Example 1 — text-only, 500-token document

Example 2 — multimodal, text + 1 image (~1000 visual tokens)

Example 3 — multimodal, text + 2 images (~2000 visual tokens)

The `usage.breakdown.input` block

How cost is computed

Rate updates — the mutability story

Estimating cost before dispatch

Refunds

Idempotency and billing

Where to read rates

Common questions

​What you pay for

​Day-1 rates for aurous-embed-vision-1.0

​Worked examples

​Example 1 — text-only, 500-token document

​Example 2 — multimodal, text + 1 image (~1000 visual tokens)

​Example 3 — multimodal, text + 2 images (~2000 visual tokens)

​The usage.breakdown.input block

​How cost is computed

​Rate updates — the mutability story

​Estimating cost before dispatch

​Refunds

​Idempotency and billing

​Where to read rates

​Common questions

​Related

What you pay for

Day-1 rates for `aurous-embed-vision-1.0`

Worked examples

Example 1 — text-only, 500-token document

Example 2 — multimodal, text + 1 image (~1000 visual tokens)

Example 3 — multimodal, text + 2 images (~2000 visual tokens)

The `usage.breakdown.input` block

How cost is computed

Rate updates — the mutability story

Estimating cost before dispatch

Refunds

Idempotency and billing

Where to read rates

Common questions

Related