POST /v1/embeddings produces a vector representation of your input that you can store, index, and use for semantic search, retrieval-augmented generation (RAG), classification, and similarity ranking. The surface is OpenAI-shaped — if your code already talks to embeddings.create, point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.
Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.
When to use embeddings
Embeddings convert input into a fixed-length float vector. Use them when you need to compare meaning rather than generate text:- Semantic search — embed documents at index time, embed the user’s query at search time, return the top-K nearest documents by cosine similarity.
- Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from your knowledge base, then pass them into a chat completion as context.
- Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label.
- Similarity ranking — deduplicate, cluster, or surface “more like this” recommendations.
When NOT to use embeddings
- Generating text — embeddings don’t produce text; use chat completions (
POST /v1/chat/completions). - Generating images — use image generation (
POST /v1/images). - N→N batch embedding — v1 does not accept
string[]batch input (the underlying model concatenates batched text into a single combined vector, opposite of OpenAI’s N→N semantics). Loop client-side to embed multiple items independently. See Multimodal for the rationale and the workaround.
Models
List available models atGET /v1/models. The day-1 embedding model is aurous-embed-vision-1.0 — a multimodal embedding model with a 128K context window that accepts text, image, and video parts and returns a single combined vector. Each model row has a top-level kind field that is either chat or embedding; sending an embedding model to /v1/chat/completions (or a chat model here) returns 400 model_wrong_kind.
Context window, supported output dimensions, capabilities, and per-modality credit rates are returned on the aurous_metadata extension of each model row.
Pricing
Embeddings are billed per token at credit rates surfaced on each model row atGET /v1/models (the embedding_pricing block) and in the usage block of every embedding response. The rate has three axes: text-token input, visual-token input (images), and video-token input. The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, worked examples, and the rate-mutability rules.
To preview cost without charging, send the same payload to POST /v1/embeddings/estimate. Estimates are an upper bound — the real credits_charged may differ a few percent for URL-fetched media.
Quick start
Response shape
Responses are standard OpenAI shape with an Aurous extension onusage:
usage.credits_charged value is the exact amount deducted from your team’s balance. The usage.breakdown.input block decomposes the charge across the three input modalities so you can correlate cost to input shape — text and visual and video are always present, with 0 for any modality not used in the request.
data is always a single-element array on v1 (one combined embedding per request). See Multimodal for the rationale.
Idempotency
PassIdempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make POST /v1/embeddings safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.
Limits
- 16 content parts maximum per request (across all modalities combined).
- 8 image_url parts maximum per request.
- 1 video_url part maximum per request.
- 128K tokens total input across text + image + video (after tokenization). Over the cap returns
400 embeddings_input_too_large. - Image and video URLs must be HTTPS-reachable in under 10 seconds. URL strings are capped at 2048 characters.
Errors
Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Embedding-specific codes you might see:model_not_found(404) — unknownmodelslug. Check the/v1/modelslisting.model_disabled(403) — model exists but admin has deactivated it.model_wrong_kind(400) — you sent a chat model to the embeddings endpoint (or vice versa).embeddings_batch_not_supported(400) —inputis astring[]array. v1 doesn’t accept batch input; loop client-side. See Multimodal.embeddings_input_too_many_items(400) — over the 16-part or 8-image cap. Split into multiple requests.embeddings_video_unsupported(400) — anyvideo_urlpart is rejected (changed 2026-05-24). Extract a representative frame in your pipeline and submit it asimage_url(bills at the visual rate).embeddings_input_too_large(400) — pre-fetch token estimate is over the model’s context window. Trim input.embeddings_unsupported_dimensions(400) — the requesteddimensionsis not supported by this model. Checkaurous_metadata.dimensionson the model row. See Dimensions.tpm_rate_limit_exceeded(429) — tokens-per-minute bucket is dry. SleepRetry-Afterand retry.embeddings_provider_unknown_error(502) — upstream returned an unmapped error. Retry with backoff.
Related
- Multimodal — the input-shape details and the multimodal worked example.
- Dimensions — when the
dimensionsparameter is accepted. - Pricing — per-token credit math and worked examples.
- Estimate — preview cost without charging.

