Choose a model

Aurous Labs ships two LLM-family models on day one. They serve different purposes; this page lays out the decision in one table and walks through the rationale.

At a glance

Capability	`aurous-grow-2.0-pro`	`aurous-embed-vision-1.0`
Kind	Chat completion	Embedding
Endpoint	`POST /v1/chat/completions`	`POST /v1/embeddings`
What it returns	An assistant message (text)	A single high-dim vector
Native vector dimensions	n/a	1,024 + 2,048 (Matryoshka-compatible)
Context window	256K tokens	128K tokens
Multimodal input	Text + images	Text + images + video
Tool / function calling	Yes (`tools` + `tool_choice`)	No
Structured output	Yes (`response_format` json_schema)	No
Streaming	Yes (SSE)	No (single-shot response)
Reasoning effort	Yes (`reasoning_effort: low/med/high`)	No
Idempotency	Non-streamed only	Always
OpenAI SDK drop-in	`client.chat.completions.create`	`client.embeddings.create`
Day-1 pricing (per 1M tokens)	$0.21 in /$ 0.84 out (text)	$0.04 text /$ 0.36 visual / $0.18 video
Latency p50 (typical request)	1.5-3s sync / first-byte ~400ms streamed	200-400ms text / 1-2s w/ media

(Pricing is illustrative — fetch GET /v1/models for the live rates on each model row; the API is the source of truth.)

Pick `aurous-grow-2.0-pro` when

You need text out. Examples:

Conversational interfaces (chat UI, AI assistant, customer support)
Tool / function calling (let the model call your APIs)
Structured extraction (turn unstructured docs into JSON)
Summarization, rewriting, translation
Code generation, code explanation, refactoring suggestions
Multimodal Q&A (paste an image, ask a question about it)
Reasoning-heavy workflows (math, multi-step planning) via reasoning_effort: "high"

Don’t pick chat-pro for: similarity search, retrieval, clustering — those want embeddings.

Quick example

res = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[
        {"role": "user", "content": "Explain microservices to a senior backend engineer in 2 sentences."},
    ],
    max_tokens=80,
)
print(res.choices[0].message.content)

Pick `aurous-embed-vision-1.0` when

You need a fixed-length numeric vector representing the input’s meaning. Examples:

Semantic search — embed your documents at index time, embed the user’s query at search time, return top-K nearest documents by cosine similarity
Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from a knowledge base, then pass them into a chat completion as context
Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label
Recommendation — “more like this” via vector similarity
Deduplication — cluster vectors and collapse near-duplicates
Multimodal cross-search — find images via a text query (or vice versa) because text and image embeddings share the same vector space

Don’t pick embed-vision for: generating text, answering questions, calling tools — those want a chat model.

Quick example

res = client.embeddings.create(
    model="aurous-embed-vision-1.0",
    input="The quick brown fox jumps over the lazy dog.",
)
print(len(res.data[0].embedding))  # 2048 native

The multimodal differentiator

aurous-embed-vision-1.0 is multimodal in one call — text + image + video → one combined vector. This is the platform’s headline differentiator over OpenAI’s embedding API, which is text-only.

res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "image_url", "image_url": {"url": "https://example.com/dog.jpg"}},
        ],
    },
    cast_to=dict,
)
# Returns a single 2048-dim vector that captures the combined semantics
# of the text + image, not two separate vectors

The combined vector lets you do cross-modal search out of the box — index documents with their associated images, then query with text alone and find documents whose image content was semantically relevant. See Multimodal embeddings for the full pattern. OpenAI’s solution to cross-modal search requires running two models (CLIP for images + text-embedding-3 for text) and learning a shared projection. Aurous gives you the shared space natively.

The 256K context window (chat) vs 128K (embedding)

The chat model handles long-context Q&A and document summarization (long PDFs, full meeting transcripts, large code repositories) up to 256K tokens (~200K English words). The embedding model is capped at 128K — appropriate for one document chunk plus context. If you need to embed a 500K-token document, chunk it: split into 5 × 100K chunks, embed each, and either:

Use the average (cosine-similarity-friendly) for whole-doc retrieval
Index each chunk independently for fine-grained retrieval (recommended for RAG)

Common patterns

RAG pipeline

   ┌────────────────────────────────────────────┐
   │  Document ingest                            │
   │  - chunk source into ~512-token pieces      │
   │  - embed each chunk via aurous-embed-vision-1.0 │
   │  - store {chunk_id, vector, text}           │
   │    in a vector DB (Pinecone, Weaviate, etc) │
   └────────────────────────────────────────────┘
                       │
                       ▼
   ┌────────────────────────────────────────────┐
   │  Query time                                 │
   │  1. Embed the user's question               │
   │  2. Top-K nearest chunks via cosine sim     │
   │  3. Send {question, top_K_chunks}           │
   │     to aurous-grow-2.0-pro for answer synthesis │
   └────────────────────────────────────────────┘

You’ll use both models in this pipeline — embed-vision for indexing + query embedding, chat-pro for the final answer synthesis.

Multimodal product catalog

   ingest:  for each product →
              chunk = title + description + product_image
              vector = embed(chunk)  // text + image in one call
              store {product_id, vector}

   search:  query = embed("comfortable running shoes for trail")
            results = top_K(cosine_sim(query, all_product_vectors))
            // returns products whose description AND image content
            // are semantically relevant — even if "comfortable" doesn't
            // appear in any product text

Day-1 limitations to know

Only one chat model + one embedding model. A “mini” or “haiku” chat tier and a text-only embedding tier are on the v1.1 roadmap for cost-sensitive workloads.
No image generation in chat. aurous-grow-2.0-pro does NOT call /v1/images for you; if you want a chat completion to produce an image, you write the tool call yourself. (See Chat tools.)
No audio yet. No speech-to-text, no text-to-speech, no audio embeddings on the platform.
No assistants / threads API. The chat completions surface is stateless; if you need threading, build it client-side.

Where to next?

Chat overview — the full chat surface
Embeddings overview — the full embedding surface
Multimodal embeddings — the multimodal differentiator deep dive
Pricing — chat rate card
Embedding pricing — embedding rate card with per-modality math
GET /v1/models — the live model catalog

Get started

Guides

Concepts

API Reference

Resources

At a glance

Pick `aurous-grow-2.0-pro` when

Quick example

Pick `aurous-embed-vision-1.0` when

Quick example

The multimodal differentiator

The 256K context window (chat) vs 128K (embedding)

Common patterns

RAG pipeline

Multimodal product catalog

Day-1 limitations to know

Where to next?

​At a glance

​Pick aurous-grow-2.0-pro when

​Quick example

​Pick aurous-embed-vision-1.0 when

​Quick example

​The multimodal differentiator

​The 256K context window (chat) vs 128K (embedding)

​Common patterns

​RAG pipeline

​Multimodal product catalog

​Day-1 limitations to know

​Where to next?

At a glance

Pick `aurous-grow-2.0-pro` when

Quick example

Pick `aurous-embed-vision-1.0` when

Quick example

The multimodal differentiator

The 256K context window (chat) vs 128K (embedding)

Common patterns

RAG pipeline

Multimodal product catalog

Day-1 limitations to know

Where to next?