Skip to main content
Models whose aurous_metadata.capabilities array contains "reasoning_effort" support an OpenAI-compatible reasoning_effort parameter. Setting it lets the model spend more or fewer hidden tokens on deliberation before producing the visible answer. aurous-grow-2.0-pro is reasoning-capable.

The parameter

{
  "model": "aurous-grow-2.0-pro",
  "messages": [...],
  "reasoning_effort": "medium",
  "max_tokens": 2048
}
Accepted values: "minimal", "low", "medium" (default), "high".
EffortBehaviorWhen to use
minimalReasoning suppressed entirely (model emits only visible output). Fastest, cheapest.Stateless single-turn answers, classification, no multi-step thinking needed.
lowMinimal hidden reasoning. Fast, cheap.Snappy chat replies, summarization, lightweight extraction.
mediumDefault. Balanced.Day-to-day prompts where you want quality without paying full freight.
highMaximum hidden reasoning. Slower, more reasoning tokens billed.Hard math, multi-step planning, tricky tool-use chains, code generation that has to compile first try.

Billing for reasoning tokens

Reasoning tokens are visible in the response usage block as reasoning_tokens — a SEPARATE count from completion_tokens (which covers visible output text only). They contribute to credits_charged at the model’s output rate; the credit subtotal is broken out as breakdown.reasoning_credits so the four credit lines reconcile cleanly with credits_charged:
{
  "usage": {
    "prompt_tokens": 180,
    "completion_tokens": 420,
    "reasoning_tokens": 350,
    "total_tokens": 950,
    "credits_charged": 0.3645,
    "breakdown": {
      "input_credits": 0.0135,
      "output_credits": 0.1890,
      "reasoning_credits": 0.1620,
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 1
    }
  }
}
A reasoning_effort: "high" turn on a hard problem can spend 1500–3000 reasoning tokens. Budget for it, or set low when you don’t need the depth.

Example

curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [
      {
        "role": "user",
        "content": "An array contains [3, -1, 4, -2, 5, -3, 6]. Find the contiguous subarray with the largest sum. Show your work."
      }
    ],
    "reasoning_effort": "high",
    "max_tokens": 2048
  }'

When the model doesn’t support reasoning effort

If you set reasoning_effort on a model without the capability, the parameter is silently ignored — the request still runs at the model’s default behavior. To verify capability before sending, check aurous_metadata.capabilities on the /v1/models listing:
curl https://api.aurous-labs.com/v1/models \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  | jq '.data[] | select(.id == "aurous-grow-2.0-pro") | .aurous_metadata.capabilities'
# → ["streaming","tools","multimodal_input","reasoning_effort","structured_output"]

Streaming with reasoning

Reasoning models support streaming. Reasoning tokens are NOT streamed back as content deltas — they appear only in the final chunk’s usage.reasoning_tokens count. From the client’s perspective, the first content delta arrives after the model finishes its hidden reasoning pass. This means streamed reasoning responses have a longer “time to first token” than non-reasoning streams. If your UX shows a typing indicator, keep it visible during the silence; the keep-alive comment frame (: keep-alive) confirms the connection is healthy.

Practical guidance

  • Default to medium. Tune up or down based on task category.
  • For agentic tool-use loops with high, watch reasoning_tokens carefully — the cost can dwarf input + visible-output tokens combined.
  • For latency-sensitive UX (typeahead, completions inside a form), prefer low or omit the param entirely.