Embeddings

POST /v1/embeddings produces a vector representation of your input that you can store, index, and use for semantic search, retrieval-augmented generation (RAG), classification, and similarity ranking. The surface is OpenAI-shaped — if your code already talks to embeddings.create, point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.

baseURL: https://api.aurous-labs.com/v1
apiKey:  al_live_<your-hex-key>

Pass the key either as Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.

When to use embeddings

Embeddings convert input into a fixed-length float vector. Use them when you need to compare meaning rather than generate text:

Semantic search — embed documents at index time, embed the user’s query at search time, return the top-K nearest documents by cosine similarity.
Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from your knowledge base, then pass them into a chat completion as context.
Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label.
Similarity ranking — deduplicate, cluster, or surface “more like this” recommendations.

When NOT to use embeddings

Generating text — embeddings don’t produce text; use chat completions (POST /v1/chat/completions).
Generating images — use image generation (POST /v1/images).
N→N batch embedding — v1 does not accept string[] batch input (the underlying model concatenates batched text into a single combined vector, opposite of OpenAI’s N→N semantics). Loop client-side to embed multiple items independently. See Multimodal for the rationale and the workaround.

Models

List available models at GET /v1/models. The day-1 embedding model is aurous-embed-vision-1.0 — a multimodal embedding model with a 128K context window that accepts text, image, and video parts and returns a single combined vector. Each model row has a top-level kind field that is either chat or embedding; sending an embedding model to /v1/chat/completions (or a chat model here) returns 400 model_wrong_kind. Context window, supported output dimensions, capabilities, and per-modality credit rates are returned on the aurous_metadata extension of each model row.

Pricing

Embeddings are billed per token at credit rates surfaced on each model row at GET /v1/models (the embedding_pricing block) and in the usage block of every embedding response. The rate has three axes: text-token input, visual-token input (images), and video-token input. The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, worked examples, and the rate-mutability rules. To preview cost without charging, send the same payload to POST /v1/embeddings/estimate. Estimates are an upper bound — the real credits_charged may differ a few percent for URL-fetched media.

Quick start

curl -X POST https://api.aurous-labs.com/v1/embeddings \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aurous-labs.com/v1",
  apiKey: process.env.AUROUS_API_KEY!, // al_live_xxxxxxxxxxxxxxxx
});

const res = await client.embeddings.create({
  model: "aurous-embed-vision-1.0",
  input: "The quick brown fox jumps over the lazy dog.",
});

console.log(res.data[0].embedding.length); // → e.g. 2048
// usage carries credits_charged + per-modality breakdown
console.log(res.usage);

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aurous-labs.com/v1",
    api_key="al_live_xxxxxxxxxxxxxxxx",  # or read from env
)

res = client.embeddings.create(
    model="aurous-embed-vision-1.0",
    input="The quick brown fox jumps over the lazy dog.",
)

print(len(res.data[0].embedding))  # → e.g. 2048
print(res.usage)  # includes credits_charged

Response shape

Responses are standard OpenAI shape with an Aurous extension on usage:

{
  "object": "list",
  "data": [
    {
      "index": 0,
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, /* ... */]
    }
  ],
  "model": "aurous-embed-vision-1.0",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12,
    "credits_charged": 0.000225,
    "breakdown": {
      "input": { "text": 0.000225, "visual": 0, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}

The usage.credits_charged value is the exact amount deducted from your team’s balance. The usage.breakdown.input block decomposes the charge across the three input modalities so you can correlate cost to input shape — text and visual and video are always present, with 0 for any modality not used in the request. data is always a single-element array on v1 (one combined embedding per request). See Multimodal for the rationale.

Idempotency

Pass Idempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make POST /v1/embeddings safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.

Limits

16 content parts maximum per request (across all modalities combined).
8 image_url parts maximum per request.
1 video_url part maximum per request.
128K tokens total input across text + image + video (after tokenization). Over the cap returns 400 embeddings_input_too_large.
Image and video URLs must be HTTPS-reachable in under 10 seconds. URL strings are capped at 2048 characters.

See Multimodal for the input shape and the worked multimodal example.

Errors

Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Embedding-specific codes you might see:

model_not_found (404) — unknown model slug. Check the /v1/models listing.
model_disabled (403) — model exists but admin has deactivated it.
model_wrong_kind (400) — you sent a chat model to the embeddings endpoint (or vice versa).
embeddings_batch_not_supported (400) — input is a string[] array. v1 doesn’t accept batch input; loop client-side. See Multimodal.
embeddings_input_too_many_items (400) — over the 16-part or 8-image cap. Split into multiple requests.
embeddings_video_unsupported (400) — any video_url part is rejected (changed 2026-05-24). Extract a representative frame in your pipeline and submit it as image_url (bills at the visual rate).
embeddings_input_too_large (400) — pre-fetch token estimate is over the model’s context window. Trim input.
embeddings_unsupported_dimensions (400) — the requested dimensions is not supported by this model. Check aurous_metadata.dimensions on the model row. See Dimensions.
tpm_rate_limit_exceeded (429) — tokens-per-minute bucket is dry. Sleep Retry-After and retry.
embeddings_provider_unknown_error (502) — upstream returned an unmapped error. Retry with backoff.

Multimodal — the input-shape details and the multimodal worked example.
Dimensions — when the dimensions parameter is accepted.
Pricing — per-token credit math and worked examples.
Estimate — preview cost without charging.

Get started

Guides

Concepts

API Reference

Resources

When to use embeddings

When NOT to use embeddings

Models

Pricing

Quick start

Response shape

Idempotency

Limits

Errors

​When to use embeddings

​When NOT to use embeddings

​Models

​Pricing

​Quick start

​Response shape

​Idempotency

​Limits

​Errors

​Related

When to use embeddings

When NOT to use embeddings

Models

Pricing

Quick start

Response shape

Idempotency

Limits

Errors

Related