Skip to main content
Code: embeddings_input_too_many_items HTTP status: 400 Type: invalid_request

When it fires

The input content-parts array on POST /v1/embeddings is over one of the pre-fetch safety caps:
LimitCap
Total content parts per request (text + image_url combined)16
image_url parts per request8
video_url parts per request0 — rejected with embeddings_video_unsupported as of 2026-05-24
The platform applies these caps before fetching any media or calling the model — the request fails fast at the DTO boundary so you don’t pay for over-budget calls. The error.message echoes which cap was hit and the actual count you sent.

How to fix it

Split the work into multiple requests. Embeddings are independent — vectors from separate requests land in your index the same way; there is no fan-out mode that combines more than 16 parts into a single embedding. For a typical RAG-for-images workflow that wants one vector per image + caption, split by image:
const items = [
  { caption: "Front view", url: "https://assets.aurous-labs.com/example-images/front.jpg" },
  { caption: "Back view",  url: "https://assets.aurous-labs.com/example-images/back.jpg" },
  // ... 20 more items
];

const vectors = await Promise.all(
  items.map((item) =>
    client.embeddings.create({
      model: "aurous-embed-vision-1.0",
      input: [
        { type: "text", text: item.caption },
        { type: "image_url", image_url: { url: item.url } },
      ],
    }),
  ),
);
For a use case that really does want one combined vector from many parts — e.g., embedding a long mixed-media document — trim to the 16-part / 8-image cap by combining text fragments into fewer, longer text parts (the 1M-character per-text-part cap is generous enough for most concatenations).

Why a cap exists

A multimodal embedding fans out into one model call that fetches every URL, tokenizes every part, and produces one vector. Removing the cap would let a single request consume arbitrary upstream budget — the platform sets a deterministic ceiling so a single misformed request can’t blow up a job. The same caps drive the cost projection in /v1/embeddings/estimate, so estimates are bounded.

Example response

{
  "error": {
    "type": "invalid_request",
    "code": "embeddings_input_too_many_items",
    "message": "input has 22 parts; the maximum is 16 per request. Split into multiple requests.",
    "param": "input",
    "doc_url": "https://docs.aurous-labs.com/errors#embeddings_input_too_many_items",
    "request_id": "req_01HXMQ7Z3K8Y2ABCDEFGHJKM"
  }
}
No credits are charged for a request rejected at the DTO boundary.