At a glance
| Capability | aurous-grow-2.0-pro | aurous-embed-vision-1.0 |
|---|---|---|
| Kind | Chat completion | Embedding |
| Endpoint | POST /v1/chat/completions | POST /v1/embeddings |
| What it returns | An assistant message (text) | A single high-dim vector |
| Native vector dimensions | n/a | 1,024 + 2,048 (Matryoshka-compatible) |
| Context window | 256K tokens | 128K tokens |
| Multimodal input | Text + images | Text + images + video |
| Tool / function calling | Yes (tools + tool_choice) | No |
| Structured output | Yes (response_format json_schema) | No |
| Streaming | Yes (SSE) | No (single-shot response) |
| Reasoning effort | Yes (reasoning_effort: low/med/high) | No |
| Idempotency | Non-streamed only | Always |
| OpenAI SDK drop-in | client.chat.completions.create | client.embeddings.create |
| Day-1 pricing (per 1M tokens) | 0.84 out (text) | 0.36 visual / $0.18 video |
| Latency p50 (typical request) | 1.5-3s sync / first-byte ~400ms streamed | 200-400ms text / 1-2s w/ media |
GET /v1/models for the live rates on each model row; the API is the source of truth.)
Pick aurous-grow-2.0-pro when
You need text out. Examples:
- Conversational interfaces (chat UI, AI assistant, customer support)
- Tool / function calling (let the model call your APIs)
- Structured extraction (turn unstructured docs into JSON)
- Summarization, rewriting, translation
- Code generation, code explanation, refactoring suggestions
- Multimodal Q&A (paste an image, ask a question about it)
- Reasoning-heavy workflows (math, multi-step planning) via
reasoning_effort: "high"
Quick example
Pick aurous-embed-vision-1.0 when
You need a fixed-length numeric vector representing the input’s meaning. Examples:
- Semantic search — embed your documents at index time, embed the user’s query at search time, return top-K nearest documents by cosine similarity
- Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from a knowledge base, then pass them into a chat completion as context
- Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label
- Recommendation — “more like this” via vector similarity
- Deduplication — cluster vectors and collapse near-duplicates
- Multimodal cross-search — find images via a text query (or vice versa) because text and image embeddings share the same vector space
Quick example
The multimodal differentiator
aurous-embed-vision-1.0 is multimodal in one call — text + image + video → one combined vector. This is the platform’s headline differentiator over OpenAI’s embedding API, which is text-only.
The 256K context window (chat) vs 128K (embedding)
The chat model handles long-context Q&A and document summarization (long PDFs, full meeting transcripts, large code repositories) up to 256K tokens (~200K English words). The embedding model is capped at 128K — appropriate for one document chunk plus context. If you need to embed a 500K-token document, chunk it: split into 5 × 100K chunks, embed each, and either:- Use the average (cosine-similarity-friendly) for whole-doc retrieval
- Index each chunk independently for fine-grained retrieval (recommended for RAG)
Common patterns
RAG pipeline
embed-vision for indexing + query embedding, chat-pro for the final answer synthesis.
Multimodal product catalog
Day-1 limitations to know
- Only one chat model + one embedding model. A “mini” or “haiku” chat tier and a text-only embedding tier are on the v1.1 roadmap for cost-sensitive workloads.
- No image generation in chat.
aurous-grow-2.0-prodoes NOT call/v1/imagesfor you; if you want a chat completion to produce an image, you write the tool call yourself. (See Chat tools.) - No audio yet. No speech-to-text, no text-to-speech, no audio embeddings on the platform.
- No assistants / threads API. The chat completions surface is stateless; if you need threading, build it client-side.
Where to next?
- Chat overview — the full chat surface
- Embeddings overview — the full embedding surface
- Multimodal embeddings — the multimodal differentiator deep dive
- Pricing — chat rate card
- Embedding pricing — embedding rate card with per-modality math
GET /v1/models— the live model catalog

