OpenAI batch input is not supported

OpenAI’s embeddings.create accepts input as a string array, and the response returns one vector per input string (N → N). Many tutorials assume this shape:

# OpenAI shape — N strings → N vectors
res = openai.embeddings.create(
    model="text-embedding-3-large",
    input=["alpha", "beta", "gamma"],
)
print(len(res.data))  # 3

Pointing that exact call at Aurous Labs returns:

{
  "error": {
    "type": "invalid_request",
    "code": "embeddings_batch_not_supported",
    "message": "POST /v1/embeddings does not accept string[] batch input. The underlying multimodal model concatenates batched text into a single combined vector — opposite of OpenAI's N→N semantics. Loop client-side or use content-parts input. See https://docs.aurous-labs.com/api-reference/embeddings/openai-batch-incompat.",
    "doc_url": "https://docs.aurous-labs.com/errors#embeddings_batch_not_supported",
    "request_id": "req_..."
  }
}

This page exists because the API behavior is intentional and the workaround is one-liner short.

Why we can’t just accept it

The Aurous embedding model is multimodal (text + image + video → one combined vector) and is published as a single-output model. If we accepted input: ["a", "b", "c"] and silently returned the OpenAI-shaped N=3 vector response, we’d be running the multimodal model in a mode that concatenates the three strings into a single combined vector and returning that one vector three times — wrong answer, hard to detect at debug time. The two valid customer intents behind input: string[] map to different APIs:

Intent	Aurous shape
”Embed each string as a separate vector, return a list.”	Loop client-side, one request per string.
”Embed all three pieces together, return one combined vec.”	Send the three pieces as `content-parts` — text+text+text — in one request.

The 400 error makes the disambiguation explicit so we don’t silently choose the wrong one for you.

Workaround #1 — loop client-side (one vector per string)

This matches OpenAI’s N→N semantics. Most code wants this.

from openai import OpenAI
client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

inputs = ["alpha", "beta", "gamma"]
vectors = []
for text in inputs:
    res = client.embeddings.create(model="aurous-embed-vision-1.0", input=text)
    vectors.append(res.data[0].embedding)

print(len(vectors), len(vectors[0]))  # 3, 2048

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!,
});

const inputs = ["alpha", "beta", "gamma"];
const vectors: number[][] = [];
for (const text of inputs) {
  const res = await client.embeddings.create({
    model: "aurous-embed-vision-1.0",
    input: text,
  });
  vectors.push(res.data[0].embedding);
}

console.log(vectors.length, vectors[0].length); // 3 2048

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",
)

async def main():
    inputs = ["alpha", "beta", "gamma"]
    tasks = [
        client.embeddings.create(model="aurous-embed-vision-1.0", input=text)
        for text in inputs
    ]
    results = await asyncio.gather(*tasks)
    return [r.data[0].embedding for r in results]

vectors = asyncio.run(main())
print(len(vectors), len(vectors[0]))  # 3, 2048

Throughput tip: parallelize the loop with asyncio.gather (Python) or Promise.all (Node). The per-team rate-limit bucket is shared across all your concurrent embedding requests — see Rate limits — so the wall-clock cost of N parallel embeddings is roughly max(N×per-request-latency / concurrency, total-tokens / TPM).

Workaround #2 — content-parts (one combined vector)

If your intent is to embed multiple pieces of related context (a document chunk plus a caption, say) and end up with a single semantic vector, use the content-parts array form. This stays within Aurous’s native single-vector contract.

cURL

curl -X POST https://api.aurous-labs.com/v1/embeddings \
  -H "X-Api-Key: $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": [
      { "type": "text", "text": "Photo of a golden retriever in a park" },
      { "type": "text", "text": "Setting: sunset, soft directional light" }
    ]
  }'

The OpenAI SDK’s typed embeddings.create does not accept the content-parts array. Use the lower-level client.post() escape hatch:

res = client.post(
    "/embeddings",
    body={
        "model": "aurous-embed-vision-1.0",
        "input": [
            {"type": "text", "text": "Photo of a golden retriever in a park"},
            {"type": "text", "text": "Setting: sunset, soft directional light"},
        ],
    },
    cast_to=dict,
)
print(len(res["data"][0]["embedding"]))  # 2048

const res = (await client.post("/embeddings", {
  body: {
    model: "aurous-embed-vision-1.0",
    input: [
      { type: "text", text: "Photo of a golden retriever in a park" },
      { type: "text", text: "Setting: sunset, soft directional light" },
    ],
  },
})) as { data: { embedding: number[] }[] };

console.log(res.data[0].embedding.length); // 2048

See Multimodal embeddings for the full content-parts taxonomy (text + image + video) and the per-modality rate math.

Why not a server-side auto-promote shim?

We considered transparently converting input: ["a", "b", "c"] to one of the workarounds server-side. We chose not to:

Auto-loop on the server — would silently translate OpenAI’s N→N intent into N separate billed requests under the hood, defeating the cost-transparency story (credits_charged would report a per-request amount that doesn’t match the single-request the customer thinks they sent).
Auto-content-parts — would silently translate into one combined vector, which is the opposite of what most callers want when they paste an OpenAI tutorial.

Both options would do something different from what the caller intended ~half the time. The current 400 with a pointed doc_url is the least-surprising path. We may revisit this with an explicit opt-in (e.g. encoding_format: "openai_batch") once we have telemetry on real customer patterns.

Where to next?

Multimodal embeddings — the full content-parts surface
Embedding limits — caps on parts, characters, URLs
Embedding pricing — per-modality credit rates
Embedding estimate — preview cost without charging

Get started

Guides

Concepts

API Reference

Resources

OpenAI batch input is not supported

Why we can’t just accept it

Workaround #1 — loop client-side (one vector per string)

Workaround #2 — content-parts (one combined vector)

Why not a server-side auto-promote shim?

Where to next?

​Why we can’t just accept it

​Workaround #1 — loop client-side (one vector per string)

​Workaround #2 — content-parts (one combined vector)

​Why not a server-side auto-promote shim?

​Where to next?

Why we can’t just accept it

Workaround #1 — loop client-side (one vector per string)

Workaround #2 — content-parts (one combined vector)

Why not a server-side auto-promote shim?

Where to next?