usage block with token counts. Every credit charge is derived from those counts. This page explains exactly how Aurous Labs counts tokens for each input type, what’s the same as OpenAI, and where you’ll see drift.
Text tokens
Text input is tokenized using the underlying provider tokenizer for each model. Foraurous-grow-2.0-pro and aurous-embed-vision-1.0, that’s the BytePlus Doubao-family tokenizer, which is closely related to but not identical to OpenAI’s tiktoken cl100k_base / o200k_base encodings.
In practice:
- English-language input tokenizes within ±10% of OpenAI’s
o200k_base(thegpt-4otokenizer) on prose; ±20% on highly-repetitive content ("hello "*1000style). - Languages with non-Latin scripts (Chinese, Japanese, Korean, Arabic, etc.) can show 20-40% drift depending on script. The Doubao tokenizer is more efficient on CJK scripts than
cl100k_base; OpenAI’so200k_basehas narrowed that gap but not eliminated it. - Code (TypeScript, Python, JSON) tokenizes within ±15%.
- Repeated short tokens (
a a a a a a ...) can land 2-3× higher thantiktokenestimates because BytePlus does not aggressively merge them.
tiktoken, allow at least a 15-25% margin for safety on English prose and 30-40% on CJK workloads. The cheapest reliable estimate is a probe call (max_tokens: 1 + read prompt_tokens from the response).
How to know the actual count before the call
Two options:POST /v1/embeddings/estimatefor embeddings — returns the token count + the credit charge the real call would compute. No hold, no billed charge, no inferences row. See estimate docs.- Read
usage.prompt_tokensfrom a small probe call for chat — issue one cheap call with the full prompt andmax_tokens: 1, read theprompt_tokensfrom the response. This costs you one billed token of completion.
count_tokens endpoint today. Adding one is on the v1.1 roadmap.
Image tokens (chat multimodal + embedding visual)
Image inputs are converted into a fixed number of visual tokens at the model’s tokenizer layer. The count depends on the image’s pixel dimensions and the model’s tile-based encoder:aurous-grow-2.0-pro(chat multimodal images): each image contributes a per-image token count derived fromceil(width / 28) × ceil(height / 28) + overhead(subject to a hard cap per image). For practical UI screenshots (1280×720, say) expect ~1,200-1,400 visual tokens per image.aurous-embed-vision-1.0(embedding image parts): each image contributes a model-internal visual token count that is not broken out asprompt_tokenson the response. Instead, theusage.breakdown.input.visualfield reports the visual-token count specifically, billed at the visual rate (separate from text). See Embeddings pricing.
tile-based scheme published for gpt-4o) is roughly similar but not identical. Don’t try to predict our count from OpenAI’s table — read the actual count off usage.breakdown.input.visual (embeddings) or usage.prompt_tokens (chat) from a probe call.
Image URL fetching
When you passimage_url: { url: "https://..." }, we fetch the bytes server-side at request time (HTTPS only, 10-second timeout, RFC1918 / loopback / link-local URLs blocked). The fetched bytes are then tokenized; if the URL is unreachable, we surface a provider_request_invalid error rather than running the model on a partial input. See URL fetching.
Video tokens (embedding only today)
aurous-embed-vision-1.0 accepts a single video part per request. Video tokens are computed from (duration_seconds × frames_per_second × per_frame_visual_token_count) and capped by the model’s context window.
Practical reference: a 30-second 1080p video → ~3,000-5,000 visual tokens. The usage.breakdown.input.video field on the response reports the exact count.
Output tokens (chat only)
For chat completions, output tokens are counted by the model as it generates.usage.completion_tokens is the count after the response finishes; for streamed responses, the count is on the final non-[DONE] chunk’s usage block (we attach usage to the last data frame, not as a separate frame, matching OpenAI’s streaming-with-usage shape).
Reasoning tokens
For reasoning-capable models invoked withreasoning_effort: "low"|"medium"|"high", the model internally produces “reasoning tokens” that don’t appear in the visible content but ARE counted toward the output cost. We surface them separately as usage.reasoning_tokens so you can see what reasoning effort actually cost.
completion_tokens is the total of visible-output + reasoning tokens, matching OpenAI’s reasoning-model semantics.
Cached input tokens — v1.1
BytePlus auto-caches stable prompt prefixes (≥ ~1,024 tokens) onaurous-grow-2.0-pro. We observe the cache hit count internally but do not pass the discount through to customers in v1.0 — the full input rate applies to all input tokens regardless of cache hit status. This is a margin policy, not a technical limitation; see Cost transparency for the rationale.
Coming in v1.1: customer-visible usage.breakdown.cached_input_tokens counter + per-team opt-in for cache-discount pass-through. Track the changelog or launch-week followups for the timeline.
We do NOT expose a manual context-cache control plane (the context.create / context.attach flow some platforms have). Auto-caching at the provider layer is good enough for the typical multi-turn pattern.
Putting it all together — sample receipts
Chat (text-only, system + user, ~50 + ~50 tokens, ~50-token reply)
Chat (with reasoning_effort=medium, ~30 + ~10 input, ~100 visible + ~400 reasoning)
reasoning_tokens is a top-level field on usage (not nested under breakdown) and is NOT included in completion_tokens — it’s a separate count. The reasoning rate equals the visible-output rate today; breakdown.reasoning_credits reports the credit subtotal attributable to reasoning so it reconciles cleanly with credits_charged.
Embedding (text only, ~50 tokens)
Embedding (text + 1 image, ~1200 visual tokens)
breakdown.input.{text, visual, video} are credit amounts, not token counts. The token counts (per-modality, computed by the model’s tokenizer) are not surfaced in the response — only the aggregate prompt_tokens is. To estimate them, use POST /v1/embeddings/estimate.
Where to next?
- Cost transparency — reading the
breakdownblock end-to-end - Chat pricing — per-token rates for chat
- Embedding pricing — per-modality rates for embeddings
GET /v1/models— per-modelchat_pricing/embedding_pricing(the live rate, including any per-team overrides)

