Skip to main content
OpenAI’s embeddings.create accepts input as a string array, and the response returns one vector per input string (N → N). Many tutorials assume this shape:
# OpenAI shape — N strings → N vectors
res = openai.embeddings.create(
    model="text-embedding-3-large",
    input=["alpha", "beta", "gamma"],
)
print(len(res.data))  # 3
Pointing that exact call at Aurous Labs returns:
{
  "error": {
    "type": "invalid_request",
    "code": "embeddings_batch_not_supported",
    "message": "POST /v1/embeddings does not accept string[] batch input. The underlying multimodal model concatenates batched text into a single combined vector — opposite of OpenAI's N→N semantics. Loop client-side or use content-parts input. See https://docs.aurous-labs.com/api-reference/embeddings/openai-batch-incompat.",
    "doc_url": "https://docs.aurous-labs.com/errors#embeddings_batch_not_supported",
    "request_id": "req_..."
  }
}
This page exists because the API behavior is intentional and the workaround is one-liner short.

Why we can’t just accept it

The Aurous embedding model is multimodal (text + image + video → one combined vector) and is published as a single-output model. If we accepted input: ["a", "b", "c"] and silently returned the OpenAI-shaped N=3 vector response, we’d be running the multimodal model in a mode that concatenates the three strings into a single combined vector and returning that one vector three times — wrong answer, hard to detect at debug time. The two valid customer intents behind input: string[] map to different APIs:
IntentAurous shape
”Embed each string as a separate vector, return a list.”Loop client-side, one request per string.
”Embed all three pieces together, return one combined vec.”Send the three pieces as content-parts — text+text+text — in one request.
The 400 error makes the disambiguation explicit so we don’t silently choose the wrong one for you.

Workaround #1 — loop client-side (one vector per string)

This matches OpenAI’s N→N semantics. Most code wants this.
from openai import OpenAI
client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

inputs = ["alpha", "beta", "gamma"]
vectors = []
for text in inputs:
    res = client.embeddings.create(model="aurous-embed-vision-1.0", input=text)
    vectors.append(res.data[0].embedding)

print(len(vectors), len(vectors[0]))  # 3, 2048
Throughput tip: parallelize the loop with asyncio.gather (Python) or Promise.all (Node). The per-team rate-limit bucket is shared across all your concurrent embedding requests — see Rate limits — so the wall-clock cost of N parallel embeddings is roughly max(N×per-request-latency / concurrency, total-tokens / TPM).

Workaround #2 — content-parts (one combined vector)

If your intent is to embed multiple pieces of related context (a document chunk plus a caption, say) and end up with a single semantic vector, use the content-parts array form. This stays within Aurous’s native single-vector contract.
cURL
curl -X POST https://api.aurous-labs.com/v1/embeddings \
  -H "X-Api-Key: $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": [
      { "type": "text", "text": "Photo of a golden retriever in a park" },
      { "type": "text", "text": "Setting: sunset, soft directional light" }
    ]
  }'
The OpenAI SDK’s typed embeddings.create does not accept the content-parts array. Use the lower-level client.post() escape hatch:
res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "text", "text": "Setting: sunset, soft directional light"},
        ],
    },
    cast_to=dict,
)
print(len(res["data"][0]["embedding"]))  # 2048
See Multimodal embeddings for the full content-parts taxonomy (text + image + video) and the per-modality rate math.

Why not a server-side auto-promote shim?

We considered transparently converting input: ["a", "b", "c"] to one of the workarounds server-side. We chose not to:
  • Auto-loop on the server — would silently translate OpenAI’s N→N intent into N separate billed requests under the hood, defeating the cost-transparency story (credits_charged would report a per-request amount that doesn’t match the single-request the customer thinks they sent).
  • Auto-content-parts — would silently translate into one combined vector, which is the opposite of what most callers want when they paste an OpenAI tutorial.
Both options would do something different from what the caller intended ~half the time. The current 400 with a pointed doc_url is the least-surprising path. We may revisit this with an explicit opt-in (e.g. encoding_format: "openai_batch") once we have telemetry on real customer patterns.

Where to next?