Skip to main content
Aurous Labs ships the credit charge for every billable call inline with the response. There’s no separate metering API to reconcile against — usage.credits_charged IS the truth, and usage.breakdown shows you the per-token math that derived it. This page walks through reading the breakdown end-to-end and reconciling it against the live rate card.

The receipt fields

Every billed response (POST /v1/chat/completions, POST /v1/embeddings, POST /v1/images, POST /v1/videos) carries a usage block. Two shape variants — chat and embedding — corresponding to the different cost classes each surface bills against.

Chat usage block

{
  "usage": {
    "prompt_tokens": 102,
    "completion_tokens": 47,
    "total_tokens": 149,
    "credits_charged": 0.0298,
    "breakdown": {
      "input_credits": 0.0145,
      "output_credits": 0.0153,
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 1
    }
  }
}
  • prompt_tokens / completion_tokens / total_tokens — OpenAI-shaped token counts. These are the billable counts.
  • reasoning_tokens — top-level field present only on responses where the model emitted reasoning (typically when reasoning_effort was set, but can also appear at the default disabled thinking mode if the model produced internal reasoning anyway). Counts the model’s internal reasoning tokens — SEPARATE from completion_tokens (which counts only visible output text). Reasoning is billed at the output rate; the credit subtotal is in breakdown.reasoning_credits.
  • credits_chargedThe single number you owe for this call. Currency is “credit”; the credit-to-USD conversion is in your team’s billing settings.
  • breakdown.input_credits / breakdown.output_credits — credit amounts spent on input tokens vs output tokens; they sum (within 4dp rounding) to credits_charged.
  • breakdown.pricing_version — the rate-card version used to compute this charge. See Pricing version pinning below.

Embedding usage block

{
  "usage": {
    "prompt_tokens": 1247,
    "total_tokens": 1247,
    "credits_charged": 0.00374,
    "breakdown": {
      "input": { "text": 0.00018, "visual": 0.00356, "video": 0 },
      "model": "aurous-embed-vision"
    }
  }
}
  • prompt_tokens / total_tokens — token counts (embeddings have no output tokens; the two fields are identical and present for OpenAI-shape compatibility).
  • credits_charged — single credit-amount owed.
  • breakdown.input.{text, visual, video} — per-modality credit amounts. They sum (within rounding) to credits_charged. The video / visual fields are always present (zero when unused) so client code can sum them without optional-chaining.
The same shape ships on streamed chat responses — usage arrives in the final non-[DONE] chunk’s usage block (matching OpenAI’s streaming-with-usage semantics, which we enable by default on our side).

Reconciling the charge

The simplest reconciliation: take the live per-model rate from GET /v1/models and multiply per-token-class counts by per-token-class rates.
import requests

models = requests.get(
    "https://api.aurous-labs.com/v1/models",
    headers={"X-Api-Key": "al_live_xxxxxxxxxxxxxxxx"},
).json()

# `data` is a list of model rows. Each chat model carries a `chat_pricing`
# block with `input.credits_per_M` and `output.credits_per_M` (the caller's
# effective rate, including any per-team override). Convert to per-1K by
# dividing by 1000 for math against token counts.
chat_model = next(m for m in models["data"] if m["id"] == "aurous-grow-2.0-pro")
input_credits_per_M  = chat_model["chat_pricing"]["input"]["credits_per_M"]
output_credits_per_M = chat_model["chat_pricing"]["output"]["credits_per_M"]

# Sanity-check a chat receipt's input_credits + output_credits sum to credits_charged:
usage = {
    "prompt_tokens": 102,
    "completion_tokens": 47,
    "credits_charged": 0.0298,
    "breakdown": {
        "input_credits": 0.0145,
        "output_credits": 0.0153,
        "model": "aurous-grow-2.0-pro",
        "pricing_version": 1,
    },
}

# (1) Reconstruct the per-1K input + output rates from the receipt
input_rate_per_1k  = usage["breakdown"]["input_credits"]  / usage["prompt_tokens"]     * 1000
output_rate_per_1k = usage["breakdown"]["output_credits"] / usage["completion_tokens"] * 1000

# (2) Check that input_credits + output_credits == credits_charged (within 4dp rounding)
total = usage["breakdown"]["input_credits"] + usage["breakdown"]["output_credits"]
assert abs(total - usage["credits_charged"]) < 0.001, f"mismatch: {total} vs {usage['credits_charged']}"
The platform stores the raw float internally and serializes it rounded to 4 decimal places in credits_charged. If your reconciliation differs by more than 0.0001 credits in either direction, there’s either a rate-card version skew (see below) or a bug we want to know about.

Pricing version pinning

Rate cards can change. Every billed chat response stamps the pricing_version it billed against:
{
  "usage": {
    "breakdown": {
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 3,
      "input_credits": 0.0145,
      "output_credits": 0.0153
    }
  }
}
pricing_version reflects the rate-card snapshot that was in force when your request landed. If a charge stamped pricing_version: 2 shows up in your ledger and a later request stamps pricing_version: 3, the earlier charge was billed at the v2 rates. We never silently retro-bill at a new rate. Embedding responses don’t currently include pricing_version in the breakdown — embedding rates have been stable since launch. If we change embedding pricing, we’ll add the field in lockstep. The Aurous-Version response header you saw on every call is a different concept — that’s the API contract version, not the pricing version. The two evolve independently. See Aurous-Version for the API version pinning story.

Forecasting cost before the call

Three options:

1. POST /v1/embeddings/estimate (embeddings only)

Same body as the real call; returns the same usage shape minus the vector. No hold, no charge. See estimate docs.
curl -X POST https://api.aurous-labs.com/v1/embeddings/estimate \
  -H "X-Api-Key: $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "aurous-embed-vision", "input": "hello world" }'

2. POST /v1/images/estimate (image generation)

Returns the credit cost without minting an inferences row. Mirrors the request shape of POST /v1/images.

3. Multiply the live rate card client-side (chat)

We do not currently expose a /v1/chat/completions/estimate endpoint — chat output tokens are not knowable until the model generates them. The best you can do is multiply your worst-case max_tokens against the output rate to compute an upper bound:
# Upper bound for a chat call: max_input × input_rate + max_tokens × output_rate
# Read the per-model rate from chat_pricing on /v1/models. credits_per_M is
# per-million tokens — divide by 1_000_000 for the math against token counts.
upper_bound = (
    input_tokens_estimate / 1_000_000 * chat_model["chat_pricing"]["input"]["credits_per_M"]
    + max_tokens / 1_000_000 * chat_model["chat_pricing"]["output"]["credits_per_M"]
)
If you set reasoning_effort, multiply max_reasoning_tokens against the output rate too. (Models commit fewer reasoning tokens at low and more at high.) The real spend will land below the upper bound; the receipt tells you the actual.

The hold mechanism

When you POST /v1/chat/completions (non-streamed) or POST /v1/embeddings, we put a credit hold on your team for the maximum the call could cost — max_tokens × output_rate + max_input × input_rate. The hold reserves credits but doesn’t bill them. When the call completes:
  • Success: the hold is committed to a charge for the actual credits_charged. The unused portion of the hold is released back to your available balance.
  • Failure / cancellation: the entire hold is released, no charge.
So your available_credits (read from GET /v1/balance) is credits - held_credits and reflects what’s actually free to spend on the next call. The held_credits view lets you see open chat / embedding holds in real-time. For streamed chat completions, the same hold mechanism applies — the hold is committed to the actual usage on the final non-[DONE] chunk; partial-completion (client disconnect) commits the tokens we’d already emitted and releases the rest.

Per-model spend caps (v1.1 roadmap)

For enterprise-grade cost control (“don’t let any single API key spend more than $50/day on aurous-grow-2.0-pro”), per-API-key spend caps are on the v1.1 roadmap. Today, the tools are:
  • GET /v1/balance to read your team’s available + held credit summary
  • GET /v1/usage with group_by=model and a model filter — to see per-model spend over time
  • Auto top-up settings in the dashboard (/dashboard/billing → Auto top-up) to prevent runaway depletion

Where to next?