Reasoning effort

Models whose aurous_metadata.capabilities array contains "reasoning_effort" support an OpenAI-compatible reasoning_effort parameter. Setting it lets the model spend more or fewer hidden tokens on deliberation before producing the visible answer. aurous-grow-2.0-pro is reasoning-capable.

The parameter

{
  "model": "aurous-grow-2.0-pro",
  "messages": [...],
  "reasoning_effort": "medium",
  "max_tokens": 2048
}

Accepted values: "minimal", "low", "medium" (default), "high".

Effort	Behavior	When to use
`minimal`	Reasoning suppressed entirely (model emits only visible output). Fastest, cheapest.	Stateless single-turn answers, classification, no multi-step thinking needed.
`low`	Minimal hidden reasoning. Fast, cheap.	Snappy chat replies, summarization, lightweight extraction.
`medium`	Default. Balanced.	Day-to-day prompts where you want quality without paying full freight.
`high`	Maximum hidden reasoning. Slower, more reasoning tokens billed.	Hard math, multi-step planning, tricky tool-use chains, code generation that has to compile first try.

Billing for reasoning tokens

Reasoning tokens are visible in the response usage block as reasoning_tokens — a SEPARATE count from completion_tokens (which covers visible output text only). They contribute to credits_charged at the model’s output rate; the credit subtotal is broken out as breakdown.reasoning_credits so the four credit lines reconcile cleanly with credits_charged:

{
  "usage": {
    "prompt_tokens": 180,
    "completion_tokens": 420,
    "reasoning_tokens": 350,
    "total_tokens": 950,
    "credits_charged": 0.3645,
    "breakdown": {
      "input_credits": 0.0135,
      "output_credits": 0.1890,
      "reasoning_credits": 0.1620,
      "model": "aurous-grow-2.0-pro",
      "pricing_version": 1
    }
  }
}

A reasoning_effort: "high" turn on a hard problem can spend 1500–3000 reasoning tokens. Budget for it, or set low when you don’t need the depth.

Example

curl -X POST https://api.aurous-labs.com/v1/chat/completions \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-grow-2.0-pro",
    "messages": [
      {
        "role": "user",
        "content": "An array contains [3, -1, 4, -2, 5, -3, 6]. Find the contiguous subarray with the largest sum. Show your work."
      }
    ],
    "reasoning_effort": "high",
    "max_tokens": 2048
  }'

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!,
});

const res = await client.chat.completions.create({
  model: "aurous-grow-2.0-pro",
  messages: [
    {
      role: "user",
      content:
        "An array contains [3, -1, 4, -2, 5, -3, 6]. Find the contiguous subarray with the largest sum. Show your work.",
    },
  ],
  // @ts-expect-error reasoning_effort lands on the openai TS types alongside o1; passes through.
  reasoning_effort: "high",
  max_tokens: 2048,
});

console.log(res.choices[0].message.content);
console.log("reasoning_tokens:", res.usage?.reasoning_tokens);

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

res = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[
        {
            "role": "user",
            "content": (
                "An array contains [3, -1, 4, -2, 5, -3, 6]. Find the "
                "contiguous subarray with the largest sum. Show your work."
            ),
        }
    ],
    reasoning_effort="high",
    max_tokens=2048,
)

print(res.choices[0].message.content)
print("reasoning_tokens:", res.usage.reasoning_tokens)

When the model doesn’t support reasoning effort

If you set reasoning_effort on a model without the capability, the parameter is silently ignored — the request still runs at the model’s default behavior. To verify capability before sending, check aurous_metadata.capabilities on the /v1/models listing:

curl https://api.aurous-labs.com/v1/models \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  | jq '.data[] | select(.id == "aurous-grow-2.0-pro") | .aurous_metadata.capabilities'
# → ["streaming","tools","multimodal_input","reasoning_effort","structured_output"]

Streaming with reasoning

Reasoning models support streaming. Reasoning tokens are NOT streamed back as content deltas — they appear only in the final chunk’s usage.reasoning_tokens count. From the client’s perspective, the first content delta arrives after the model finishes its hidden reasoning pass. This means streamed reasoning responses have a longer “time to first token” than non-reasoning streams. If your UX shows a typing indicator, keep it visible during the silence; the keep-alive comment frame (: keep-alive) confirms the connection is healthy.

Practical guidance

Default to medium. Tune up or down based on task category.
For agentic tool-use loops with high, watch reasoning_tokens carefully — the cost can dwarf input + visible-output tokens combined.
For latency-sensitive UX (typeahead, completions inside a form), prefer low or omit the param entirely.

Get started

Guides

Concepts

API Reference

Resources

The parameter

Billing for reasoning tokens

Example

When the model doesn’t support reasoning effort

Streaming with reasoning

Practical guidance

​The parameter

​Billing for reasoning tokens

​Example

​When the model doesn’t support reasoning effort

​Streaming with reasoning

​Practical guidance

The parameter

Billing for reasoning tokens

Example

When the model doesn’t support reasoning effort

Streaming with reasoning

Practical guidance