https://api.preprod.aurous-labs.com and the output is shown verbatim.
1. Authenticate
Create a team in the dashboard and mint a key underSettings → API keys. The plaintext is shown once at creation — store it in a secret manager. Keys look like al_live_<64-hex>.
Send the key in the X-Api-Key header, OR as Authorization: Bearer al_live_... (OpenAI SDKs do this automatically). A 2-second proof-of-life:
401 invalid_api_key, double-check the header name and the value.
2. Chat completions
The chat surface is intentionally OpenAI-compatible. Most developers re-use the official OpenAI SDK with two lines changed — thebaseURL and the apiKey.
choices[0].message.content, plus a usage block that includes the token counts and the credit charge. Every chat completion mints a unique cmp_<ulid> id you can fetch later with GET /v1/chat/completions/{id}.
- Streaming: pass
stream: trueto receive SSE frames as they arrive. See Chat streaming. - Tool calling: pass
tools+tool_choicefor function calling. See Chat tools. - Structured output: pass
response_format: { type: "json_schema", json_schema: {...} }for schema-enforced JSON output. - Multimodal: include
image_urlcontent parts in messages. See Chat multimodal. - Reasoning effort: set
reasoning_effort: "high"for reasoning-capable models. See Chat reasoning.
3. Embeddings
POST /v1/embeddings turns text, images, and video into a single high-dimensional vector you can store, index, and compare. The same OpenAI SDK pattern works — change two lines and call client.embeddings.create.
cURL (text + image)
input: [...] is a string array in OpenAI’s TypeScript types). To send multimodal input via the OpenAI SDK, use the lower-level client.post() escape hatch — see Multimodal embeddings. For a deeper note on why input: ["a","b","c"] is not supported, see OpenAI batch incompat.
Want to know the cost without billing? POST /v1/embeddings/estimate returns the same response shape minus the vector, no credits charged.
4. Image generation
POST /v1/images is the image-generation surface — pick a LoRA from GET /v1/loras (or omit lora_id and let our dispatcher pick a style based on your prompt), wait ~10-30 seconds, fetch the bytes.
pending (or processing) state. Poll GET /v1/images/{id} until status is one of succeeded, failed, cancelled, expired, or moderation_rejected — most generations finish in 10-30 seconds.
output_urls[0] is an anonymous-read proxy URL on api.aurous-labs.com. Fetch it directly — no auth header needed. Image output URLs expire ~7 days after generation (410 Gone with code: output_expired after that); video output URLs expire ~24 hours after generation. Save what you want to keep.
Response headers worth knowing
Every response carries:Aurous-Request-Id: req_<26-char ULID>— quote this in any support ticket so we can find your request.Aurous-Version: YYYY-MM-DD— the API version applied to this response. Pin a specific version with theAurous-Versionrequest header to insulate yourself from future changes.X-RateLimit-Limit/X-RateLimit-Remaining/X-RateLimit-Reset— see Rate limits.- On
POST /v1/chat/completionsandPOST /v1/embeddings:X-RateLimit-TPM-Limit/X-RateLimit-TPM-Remaining/X-RateLimit-TPM-Reset— a separate tokens-per-minute bucket on top of the standard RPM bucket. - On idempotent writes (chat, embeddings, images, videos):
Aurous-Idempotent-Replayed: truewhen the response is a replay of a prior successful call with the sameIdempotency-Key.
Where to next?
- Authentication — key formats, scopes, rotation, the 24h grace window
- Errors — every type, every code, what to do
- Idempotency — safe retries on
POST /v1/chat/completions,/v1/embeddings,/v1/images,/v1/videos - Rate limits — per-team RPM + per-team TPM buckets, headers, what
429means - Webhooks — signed deliveries, retries, secret rotation
- API reference — every endpoint, every parameter

