aurous_metadata.capabilities array contains "reasoning_effort" support an OpenAI-compatible reasoning_effort parameter. Setting it lets the model spend more or fewer hidden tokens on deliberation before producing the visible answer. aurous-grow-2.0-pro is reasoning-capable.
The parameter
"minimal", "low", "medium" (default), "high".
| Effort | Behavior | When to use |
|---|---|---|
minimal | Reasoning suppressed entirely (model emits only visible output). Fastest, cheapest. | Stateless single-turn answers, classification, no multi-step thinking needed. |
low | Minimal hidden reasoning. Fast, cheap. | Snappy chat replies, summarization, lightweight extraction. |
medium | Default. Balanced. | Day-to-day prompts where you want quality without paying full freight. |
high | Maximum hidden reasoning. Slower, more reasoning tokens billed. | Hard math, multi-step planning, tricky tool-use chains, code generation that has to compile first try. |
Billing for reasoning tokens
Reasoning tokens are visible in the responseusage block as reasoning_tokens — a SEPARATE count from completion_tokens (which covers visible output text only). They contribute to credits_charged at the model’s output rate; the credit subtotal is broken out as breakdown.reasoning_credits so the four credit lines reconcile cleanly with credits_charged:
reasoning_effort: "high" turn on a hard problem can spend 1500–3000 reasoning tokens. Budget for it, or set low when you don’t need the depth.
Example
When the model doesn’t support reasoning effort
If you setreasoning_effort on a model without the capability, the parameter is silently ignored — the request still runs at the model’s default behavior. To verify capability before sending, check aurous_metadata.capabilities on the /v1/models listing:
Streaming with reasoning
Reasoning models support streaming. Reasoning tokens are NOT streamed back as content deltas — they appear only in the final chunk’susage.reasoning_tokens count. From the client’s perspective, the first content delta arrives after the model finishes its hidden reasoning pass.
This means streamed reasoning responses have a longer “time to first token” than non-reasoning streams. If your UX shows a typing indicator, keep it visible during the silence; the keep-alive comment frame (: keep-alive) confirms the connection is healthy.
Practical guidance
- Default to
medium. Tune up or down based on task category. - For agentic tool-use loops with
high, watchreasoning_tokenscarefully — the cost can dwarf input + visible-output tokens combined. - For latency-sensitive UX (typeahead, completions inside a form), prefer
lowor omit the param entirely.

