GET /v1/models under the embedding_pricing block. The rate is the caller’s effective rate — any per-team override is already applied. Each response carries a usage block with the exact charge and a per-modality breakdown so you can attribute cost to inputs.
Changed 2026-05-24. The
embedding_pricing.video rate was removed. Video input on POST /v1/embeddings is rejected with embeddings_video_unsupported. The provider folds video frames into the visual billing bucket, so the published video rate never actually fired — to keep the receipt honest we removed the shape. Embed visual content from videos by extracting frames in your pipeline and submitting them as image_url parts at the visual rate. See the changelog for migration notes.What you pay for
Each embedding request is billed across two input buckets, depending on the modalities you sent:| Bucket | Counts | Rate field on /v1/models |
|---|---|---|
| Input — text | prompt_tokens from text parts (or the plain-string input) | embedding_pricing.text.credits_per_M |
| Input — visual | prompt_tokens attributed to image_url parts | embedding_pricing.visual.credits_per_M |
credits_per_M by 1000.
There is no separate “output” bucket — embeddings return a vector, not generated tokens.
Day-1 rates for aurous-embed-vision-1.0
From the seeded pricing config (rates are mutable; the live values are always at /v1/models on the per-model embedding_pricing block):
| Modality | Raw rate (USD per 1M input tokens) | Credit rate (credits per 1M input tokens) |
|---|---|---|
| Text | $0.125 / 1M | 18.75 credits / 1M |
| Visual (image) | $0.325 / 1M | 48.75 credits / 1M |
usd_per_M, the platform anchor usd_per_credit (1 credit = $0.01 by default), and a percentage markup. See How cost is computed below for the formula.
Worked examples
The examples below use the day-1 rates and the default anchor (usd_per_credit = $0.01, markup_pct = 50%). Numbers are exact; round to your billing precision.
Example 1 — text-only, 500-token document
breakdown.input.video is retained on the response shape at 0 for one release cycle so existing SDKs that read it don’t break — it will be dropped in a follow-up release.)
Example 2 — multimodal, text + 1 image (~1000 visual tokens)
Example 3 — multimodal, text + 2 images (~2000 visual tokens)
The usage.breakdown.input block
Every embedding response carries this block:
text— credits attributed to the text portion of the input (always present,0when no text).visual— credits attributed to image parts (always present,0when no images).video— deprecated 2026-05-24, retained at0for one release cycle so existing SDKs that read it don’t break. Will be dropped in a follow-up release.model— the model slug, echoing back the request to make audit trails self-contained.
text + visual) equals credits_charged in the common case (see the next section for the rare overdraft path).
How cost is computed
The platform applies this formula per modality, then sums:visual with its own rate. The final credits_charged is the sum across text + visual.
credits_charged is the authoritative value — deduct it from your team balance and store it next to the row in your ledger. On rare overdraft-fallback paths (where the upstream model returned more tokens than the held credits could cover), the modality components in breakdown.input are scaled to reconcile back to credits_charged. Raw, un-scaled values are preserved in the platform’s audit trail and surfaced through GET /v1/usage for FinOps reconciliation.
Rate updates — the mutability story
Embedding rates are mutable: admin can update them at any time without a freshAurous-Version release — LLM rates are explicitly mutable in the contract. To detect a rate change, poll GET /v1/models and compare the per-model embedding_pricing.{text,visual}.credits_per_M against the value you last cached.
The per-request charge is always deterministic against the rate card in force when the request was created — not against whatever the latest rates are at the moment you read the response.
Estimating cost before dispatch
Send the same payload toPOST /v1/embeddings/estimate to preview the charge without billing. See Estimate for the endpoint shape, the upper-bound caveat, and SDK examples.
Refunds
| Outcome | Charged for |
|---|---|
| Successful embedding | Actuals as reported in usage. |
| Bad input (400 from the platform) | 0. No row written, no credits deducted. |
| Provider returned a 5xx | 0. The hold is released. Retry with backoff. |
embeddings_input_too_large (pre-fetch token estimate exceeds context window) | 0. The DTO rejects before the provider is called. |
embeddings_video_unsupported (any video_url part) | 0. Rejected at the DTO boundary; embed extracted frames as image_url instead. |
Idempotency and billing
POST /v1/embeddings accepts Idempotency-Key. Same key + same body within 24h replays the cached response — including the original credits_charged. You will NOT be double-charged for a retried key. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.
Where to read rates
GET /v1/models— per-modelembedding_pricing.{text,visual}.credits_per_Mrows (the caller’s effective rate, including any per-team override).- Estimate — preview cost for a specific payload.
Common questions
Are the per-modality rates the same? No. Onaurous-embed-vision-1.0 the visual rate is higher than text per token, reflecting the cost difference. Plain-string input pays only the text rate.
Can I see usage trends? Yes — GET /v1/usage aggregates credits by day/key/model with the same per-modality split.
What happened to the video rate? Removed 2026-05-24. The provider folds video frames into the visual billing bucket — the published video rate never actually fired. Submitting a video_url part now returns embeddings_video_unsupported. Extract a representative frame in your pipeline and embed it as an image_url part at the visual rate.
Related
- Overview — embeddings surface and quick start.
- Multimodal — the input shapes that drive the visual bucket.
- Estimate — preview cost without charging.
- Errors — the full error taxonomy.

