Skip to main content
Aurous Labs ships two LLM-family models on day one. They serve different purposes; this page lays out the decision in one table and walks through the rationale.

At a glance

Capabilityaurous-grow-2.0-proaurous-embed-vision-1.0
KindChat completionEmbedding
EndpointPOST /v1/chat/completionsPOST /v1/embeddings
What it returnsAn assistant message (text)A single high-dim vector
Native vector dimensionsn/a1,024 + 2,048 (Matryoshka-compatible)
Context window256K tokens128K tokens
Multimodal inputText + imagesText + images + video
Tool / function callingYes (tools + tool_choice)No
Structured outputYes (response_format json_schema)No
StreamingYes (SSE)No (single-shot response)
Reasoning effortYes (reasoning_effort: low/med/high)No
IdempotencyNon-streamed onlyAlways
OpenAI SDK drop-inclient.chat.completions.createclient.embeddings.create
Day-1 pricing (per 1M tokens)0.21in/0.21 in / 0.84 out (text)0.04text/0.04 text / 0.36 visual / $0.18 video
Latency p50 (typical request)1.5-3s sync / first-byte ~400ms streamed200-400ms text / 1-2s w/ media
(Pricing is illustrative — fetch GET /v1/models for the live rates on each model row; the API is the source of truth.)

Pick aurous-grow-2.0-pro when

You need text out. Examples:
  • Conversational interfaces (chat UI, AI assistant, customer support)
  • Tool / function calling (let the model call your APIs)
  • Structured extraction (turn unstructured docs into JSON)
  • Summarization, rewriting, translation
  • Code generation, code explanation, refactoring suggestions
  • Multimodal Q&A (paste an image, ask a question about it)
  • Reasoning-heavy workflows (math, multi-step planning) via reasoning_effort: "high"
Don’t pick chat-pro for: similarity search, retrieval, clustering — those want embeddings.

Quick example

res = client.chat.completions.create(
    model="aurous-grow-2.0-pro",
    messages=[
        {"role": "user", "content": "Explain microservices to a senior backend engineer in 2 sentences."},
    ],
    max_tokens=80,
)
print(res.choices[0].message.content)

Pick aurous-embed-vision-1.0 when

You need a fixed-length numeric vector representing the input’s meaning. Examples:
  • Semantic search — embed your documents at index time, embed the user’s query at search time, return top-K nearest documents by cosine similarity
  • Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from a knowledge base, then pass them into a chat completion as context
  • Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label
  • Recommendation — “more like this” via vector similarity
  • Deduplication — cluster vectors and collapse near-duplicates
  • Multimodal cross-search — find images via a text query (or vice versa) because text and image embeddings share the same vector space
Don’t pick embed-vision for: generating text, answering questions, calling tools — those want a chat model.

Quick example

res = client.embeddings.create(
    model="aurous-embed-vision-1.0",
    input="The quick brown fox jumps over the lazy dog.",
)
print(len(res.data[0].embedding))  # 2048 native

The multimodal differentiator

aurous-embed-vision-1.0 is multimodal in one call — text + image + video → one combined vector. This is the platform’s headline differentiator over OpenAI’s embedding API, which is text-only.
res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "image_url", "image_url": {"url": "https://example.com/dog.jpg"}},
        ],
    },
    cast_to=dict,
)
# Returns a single 2048-dim vector that captures the combined semantics
# of the text + image, not two separate vectors
The combined vector lets you do cross-modal search out of the box — index documents with their associated images, then query with text alone and find documents whose image content was semantically relevant. See Multimodal embeddings for the full pattern. OpenAI’s solution to cross-modal search requires running two models (CLIP for images + text-embedding-3 for text) and learning a shared projection. Aurous gives you the shared space natively.

The 256K context window (chat) vs 128K (embedding)

The chat model handles long-context Q&A and document summarization (long PDFs, full meeting transcripts, large code repositories) up to 256K tokens (~200K English words). The embedding model is capped at 128K — appropriate for one document chunk plus context. If you need to embed a 500K-token document, chunk it: split into 5 × 100K chunks, embed each, and either:
  • Use the average (cosine-similarity-friendly) for whole-doc retrieval
  • Index each chunk independently for fine-grained retrieval (recommended for RAG)

Common patterns

RAG pipeline

   ┌────────────────────────────────────────────┐
   │  Document ingest                            │
   │  - chunk source into ~512-token pieces      │
   │  - embed each chunk via aurous-embed-vision-1.0 │
   │  - store {chunk_id, vector, text}           │
   │    in a vector DB (Pinecone, Weaviate, etc) │
   └────────────────────────────────────────────┘


   ┌────────────────────────────────────────────┐
   │  Query time                                 │
   │  1. Embed the user's question               │
   │  2. Top-K nearest chunks via cosine sim     │
   │  3. Send {question, top_K_chunks}           │
   │     to aurous-grow-2.0-pro for answer synthesis │
   └────────────────────────────────────────────┘
You’ll use both models in this pipeline — embed-vision for indexing + query embedding, chat-pro for the final answer synthesis.

Multimodal product catalog

   ingest:  for each product →
              chunk = title + description + product_image
              vector = embed(chunk)  // text + image in one call
              store {product_id, vector}

   search:  query = embed("comfortable running shoes for trail")
            results = top_K(cosine_sim(query, all_product_vectors))
            // returns products whose description AND image content
            // are semantically relevant — even if "comfortable" doesn't
            // appear in any product text

Day-1 limitations to know

  • Only one chat model + one embedding model. A “mini” or “haiku” chat tier and a text-only embedding tier are on the v1.1 roadmap for cost-sensitive workloads.
  • No image generation in chat. aurous-grow-2.0-pro does NOT call /v1/images for you; if you want a chat completion to produce an image, you write the tool call yourself. (See Chat tools.)
  • No audio yet. No speech-to-text, no text-to-speech, no audio embeddings on the platform.
  • No assistants / threads API. The chat completions surface is stateless; if you need threading, build it client-side.

Where to next?