Skip to main content
POST /v1/embeddings produces a vector representation of your input that you can store, index, and use for semantic search, retrieval-augmented generation (RAG), classification, and similarity ranking. The surface is OpenAI-shaped — if your code already talks to embeddings.create, point the SDK at Aurous Labs by changing two lines: the baseURL and the API key.
baseURL: https://api.aurous-labs.com/v1
apiKey:  al_live_<your-hex-key>
Pass the key either as Authorization: Bearer al_live_... (what OpenAI SDKs send by default) or as X-Api-Key: al_live_.... Both are accepted.

When to use embeddings

Embeddings convert input into a fixed-length float vector. Use them when you need to compare meaning rather than generate text:
  • Semantic search — embed documents at index time, embed the user’s query at search time, return the top-K nearest documents by cosine similarity.
  • Retrieval-augmented generation (RAG) — pull semantically-relevant chunks from your knowledge base, then pass them into a chat completion as context.
  • Classification — embed labeled examples once, then embed new inputs and route based on nearest-neighbor label.
  • Similarity ranking — deduplicate, cluster, or surface “more like this” recommendations.

When NOT to use embeddings

  • Generating text — embeddings don’t produce text; use chat completions (POST /v1/chat/completions).
  • Generating images — use image generation (POST /v1/images).
  • N→N batch embedding — v1 does not accept string[] batch input (the underlying model concatenates batched text into a single combined vector, opposite of OpenAI’s N→N semantics). Loop client-side to embed multiple items independently. See Multimodal for the rationale and the workaround.

Models

List available models at GET /v1/models. The day-1 embedding model is aurous-embed-vision-1.0 — a multimodal embedding model with a 128K context window that accepts text, image, and video parts and returns a single combined vector. Each model row has a top-level kind field that is either chat or embedding; sending an embedding model to /v1/chat/completions (or a chat model here) returns 400 model_wrong_kind. Context window, supported output dimensions, capabilities, and per-modality credit rates are returned on the aurous_metadata extension of each model row.

Pricing

Embeddings are billed per token at credit rates surfaced on each model row at GET /v1/models (the embedding_pricing block) and in the usage block of every embedding response. The rate has three axes: text-token input, visual-token input (images), and video-token input. The rate on /v1/models is the caller’s effective rate — any per-team override is already applied. See Pricing for the credit math, worked examples, and the rate-mutability rules. To preview cost without charging, send the same payload to POST /v1/embeddings/estimate. Estimates are an upper bound — the real credits_charged may differ a few percent for URL-fetched media.

Quick start

curl -X POST https://api.aurous-labs.com/v1/embeddings \
  -H "Authorization: Bearer $AUROUS_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "model": "aurous-embed-vision-1.0",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Response shape

Responses are standard OpenAI shape with an Aurous extension on usage:
{
  "object": "list",
  "data": [
    {
      "index": 0,
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, /* ... */]
    }
  ],
  "model": "aurous-embed-vision-1.0",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12,
    "credits_charged": 0.000225,
    "breakdown": {
      "input": { "text": 0.000225, "visual": 0, "video": 0 },
      "model": "aurous-embed-vision-1.0"
    }
  }
}
The usage.credits_charged value is the exact amount deducted from your team’s balance. The usage.breakdown.input block decomposes the charge across the three input modalities so you can correlate cost to input shape — text and visual and video are always present, with 0 for any modality not used in the request. data is always a single-element array on v1 (one combined embedding per request). See Multimodal for the rationale.

Idempotency

Pass Idempotency-Key (any opaque value, 1–256 chars; UUID v4 recommended) to make POST /v1/embeddings safe to retry. Same key + same body within 24h replays the cached response with Aurous-Idempotent-Replayed: true. Same key + different body returns 409 idempotency_key_in_use. See Idempotency for the full semantics.

Limits

  • 16 content parts maximum per request (across all modalities combined).
  • 8 image_url parts maximum per request.
  • 1 video_url part maximum per request.
  • 128K tokens total input across text + image + video (after tokenization). Over the cap returns 400 embeddings_input_too_large.
  • Image and video URLs must be HTTPS-reachable in under 10 seconds. URL strings are capped at 2048 characters.
See Multimodal for the input shape and the worked multimodal example.

Errors

Every non-2xx response uses the standard Aurous error envelope — see Errors for the full taxonomy. Embedding-specific codes you might see:
  • model_not_found (404) — unknown model slug. Check the /v1/models listing.
  • model_disabled (403) — model exists but admin has deactivated it.
  • model_wrong_kind (400) — you sent a chat model to the embeddings endpoint (or vice versa).
  • embeddings_batch_not_supported (400) — input is a string[] array. v1 doesn’t accept batch input; loop client-side. See Multimodal.
  • embeddings_input_too_many_items (400) — over the 16-part or 8-image cap. Split into multiple requests.
  • embeddings_video_unsupported (400) — any video_url part is rejected (changed 2026-05-24). Extract a representative frame in your pipeline and submit it as image_url (bills at the visual rate).
  • embeddings_input_too_large (400) — pre-fetch token estimate is over the model’s context window. Trim input.
  • embeddings_unsupported_dimensions (400) — the requested dimensions is not supported by this model. Check aurous_metadata.dimensions on the model row. See Dimensions.
  • tpm_rate_limit_exceeded (429) — tokens-per-minute bucket is dry. Sleep Retry-After and retry.
  • embeddings_provider_unknown_error (502) — upstream returned an unmapped error. Retry with backoff.
  • Multimodal — the input-shape details and the multimodal worked example.
  • Dimensions — when the dimensions parameter is accepted.
  • Pricing — per-token credit math and worked examples.
  • Estimate — preview cost without charging.