embeddings.create accepts input as a string array, and the response returns one vector per input string (N → N). Many tutorials assume this shape:
Why we can’t just accept it
The Aurous embedding model is multimodal (text + image + video → one combined vector) and is published as a single-output model. If we acceptedinput: ["a", "b", "c"] and silently returned the OpenAI-shaped N=3 vector response, we’d be running the multimodal model in a mode that concatenates the three strings into a single combined vector and returning that one vector three times — wrong answer, hard to detect at debug time.
The two valid customer intents behind input: string[] map to different APIs:
| Intent | Aurous shape |
|---|---|
| ”Embed each string as a separate vector, return a list.” | Loop client-side, one request per string. |
| ”Embed all three pieces together, return one combined vec.” | Send the three pieces as content-parts — text+text+text — in one request. |
Workaround #1 — loop client-side (one vector per string)
This matches OpenAI’s N→N semantics. Most code wants this.asyncio.gather (Python) or Promise.all (Node). The per-team rate-limit bucket is shared across all your concurrent embedding requests — see Rate limits — so the wall-clock cost of N parallel embeddings is roughly max(N×per-request-latency / concurrency, total-tokens / TPM).
Workaround #2 — content-parts (one combined vector)
If your intent is to embed multiple pieces of related context (a document chunk plus a caption, say) and end up with a single semantic vector, use the content-parts array form. This stays within Aurous’s native single-vector contract.cURL
embeddings.create does not accept the content-parts array. Use the lower-level client.post() escape hatch:
Why not a server-side auto-promote shim?
We considered transparently convertinginput: ["a", "b", "c"] to one of the workarounds server-side. We chose not to:
- Auto-loop on the server — would silently translate OpenAI’s N→N intent into N separate billed requests under the hood, defeating the cost-transparency story (
credits_chargedwould report a per-request amount that doesn’t match the single-request the customer thinks they sent). - Auto-content-parts — would silently translate into one combined vector, which is the opposite of what most callers want when they paste an OpenAI tutorial.
doc_url is the least-surprising path.
We may revisit this with an explicit opt-in (e.g. encoding_format: "openai_batch") once we have telemetry on real customer patterns.
Where to next?
- Multimodal embeddings — the full content-parts surface
- Embedding limits — caps on parts, characters, URLs
- Embedding pricing — per-modality credit rates
- Embedding estimate — preview cost without charging

