aurous-embed-vision-1.0 is a multimodal embedding model: a single request that mixes text and image parts produces one embedding representing both together. This is the distinguishing feature of the embeddings surface — you get a single vector that captures the semantic relationship between text and visual content in the same document, not separate vectors per modality.
Video input is no longer accepted as of 2026-05-24.
video_url parts return embeddings_video_unsupported. The provider folded video frames into the visual billing bucket — the published video rate never actually fired. Extract a representative frame in your pipeline and submit it as image_url; it bills at the visual rate.Input shape
input accepts two shapes:
- A plain string — text-only embedding. The simplest form.
- An array of content parts — multimodal embedding. Mix
textandimage_urlparts in one request; the model concatenates them into a single combined document and returns one embedding for the whole thing.
type | Payload | Notes |
|---|---|---|
text | { "text": "..." } | UTF-8 text. NULL bytes are rejected. Max 1,000,000 characters per part. |
image_url | { "image_url": { "url": "https://..." } } | HTTPS URL, ≤ 2048 chars. Must be fetchable in under 10s. |
video_url | — | Rejected as of 2026-05-24. Returns embeddings_video_unsupported. |
One request → one embedding
The fundamental difference from OpenAI’s embedding API is this: the v1 surface returns exactly one embedding per request, regardless of how many parts the input array contains. When you pass a content-parts array, the model treats the parts as one ordered document and produces a single vector representing the whole thing. This is intentional — it lets you embed text + an image together so the resulting vector captures their joint meaning (a product description fused with the photo, a chart caption fused with the chart image).Batch rejection — the string[] shape is NOT accepted on v1
OpenAI’s API accepts input: string[] and returns one embedding per string (N→N). Aurous Labs rejects that shape on v1 because the underlying model would concatenate the strings into a single document and return one combined vector — the opposite of what an OpenAI-trained customer would expect. Silently swapping semantics would cause subtle bugs in production code (your “100 documents embedded” call would return 1 unusable embedding).
The platform returns 400 embeddings_batch_not_supported whenever input is an array of pure strings. Two workarounds:
Option 1 — loop client-side
Send one request per item. This is the equivalent of OpenAI’s N→N batch semantics. UsePromise.all (Node) or asyncio.gather (Python) to parallelize.
Option 2 — pass content parts for a deliberately combined embedding
If you actually want one embedding representing several text fragments fused together (e.g., a title + description + tags as one document), pass them as content parts:Worked example — text + image
A typical RAG-for-images use case: embed a product photo with its description, store the vector, search later with a user’s natural-language query.breakdown.input.text and breakdown.input.visual fields decompose the charge across modalities so you can attribute cost to inputs. See Pricing for the per-1K credit math.
Image-URL requirements
- HTTPS only. Plain HTTP and
data:URIs are rejected. - ≤ 2048 characters per URL string.
- Fetchable in under 10 seconds. Long-running fetches are treated as failed requests.
- Public reachability. The platform fetches the URL from a server-side IP, so private hosts (localhost, RFC 1918 ranges, internal VPC) are not accessible.
Limits
| Limit | Cap | Code on violation |
|---|---|---|
| Total content parts per request | 16 | embeddings_input_too_many_items |
image_url parts per request | 8 | embeddings_input_too_many_items |
video_url parts per request | 0 (any video → reject) | embeddings_video_unsupported |
| Total input tokens (after tokenization) | 128,000 | embeddings_input_too_large |
| URL string length | 2048 chars | invalid_request (DTO validation) |
| Text part character length | 1,000,000 chars | invalid_request (DTO validation) |
Errors
embeddings_batch_not_supported(400) —inputwas an array of pure strings. Loop client-side or pass content parts. See batch rejection.embeddings_input_too_many_items(400) — over the 16-part or 8-image cap. Split into multiple requests.embeddings_video_unsupported(400) — anyvideo_urlpart is rejected. Extract a representative frame in your pipeline and submit it asimage_url(bills at the visual rate).embeddings_input_too_large(400) — pre-fetch tokenization estimates over the context window. Trim input or skip the largest part.embeddings_provider_unknown_error(502) — upstream returned an error the platform’s mapping table doesn’t yet recognize. Retry with backoff; quote therequest_id.

