URL fetching for image_url

When you pass image_url: { url: "..." } to POST /v1/embeddings, the platform fetches the image bytes server-side at request time and feeds them to the embedding model. This page covers the fetch guards (what we block to prevent SSRF and abuse) and the failure modes (what error you get when the URL is unreachable).

As of 2026-05-24, video_url parts are rejected with embeddings_video_unsupported before any fetch is attempted. Extract a representative frame in your pipeline and submit it as image_url; it bills at the visual rate.

The fetch is server-side

Aurous fetches the URL from our infrastructure, NOT your browser/server. Implications:

The URL must be publicly reachable from our cloud, not from your private network
Our IP appears in your CDN’s logs as the requester, not your end user’s IP
Authentication via header (Bearer token, signed URL params, etc.) on the URL is preserved — we forward the URL as-is to the GET request

We do not store the fetched bytes after the embedding completes (no cache; bytes are streamed into the model and discarded).

Allowed schemes

https:// — accepted

Blocked schemes:

http:// — rejected (security; we require TLS)
data: — rejected (use direct base64 inside the request body via image_url.b64_json if you have inline bytes; that’s a separate ingest path)
ftp://, file://, gopher://, etc. — rejected
Bare hostnames without a scheme — rejected

Attempting any blocked scheme returns 400 invalid_request from the DTO validator before the request reaches the fetch layer.

Blocked address ranges

Even with a valid https:// URL, the resolved IP must NOT fall into one of these ranges:

Loopback: 127.0.0.0/8, ::1
Link-local: 169.254.0.0/16 (incl. cloud metadata 169.254.169.254), fe80::/10
RFC1918 private: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Unique local IPv6: fc00::/7
Multicast / reserved: 224.0.0.0/4, 240.0.0.0/4
Cloud metadata endpoints: hostnames like metadata.google.internal, metadata.aws.com, 169.254.170.2, etc.

Resolving to a blocked address returns 400 invalid_request with a url host ... is a private / loopback / link-local IPv4 address message and the param set to the offending content-parts path (input[*].image_url.url for embeddings, messages[*].content[*].image_url.url for chat). The error code is shared across both surfaces. The resolution check happens AFTER we look up DNS — a URL that resolves to a public address now but to a private address tomorrow (DNS rebinding) is still blocked on second resolution. We do not pin the resolved IP across the request lifecycle, but we do re-check the IP at TLS-connect time.

Timeout

Each URL fetch has a 10-second timeout. URLs that take longer to first-byte return 502 chat_provider_request_invalid with detail url_fetch_timeout. Common causes:

The host is geo-distant and TLS handshake is slow
The host is rate-limiting our IP
The asset is huge (>500MB) and slow to deliver

For consistent latency, host your media on a CDN with global PoPs. Cloudflare R2 + the Cloudflare CDN, S3 + CloudFront, GCS + Cloud CDN, etc. are all fine.

Size limits

Image fetch cap: 50 MB per image
Video fetch cap: 500 MB per video

Exceeding the cap returns 400 invalid_request with detail url_size_exceeded. The cap is enforced by streaming the body and aborting when the size is exceeded — we do not download then check.

Content-type validation

We require the response Content-Type header to match the media kind:

Image URLs: Content-Type must start with image/ (e.g. image/jpeg, image/png, image/webp)
Video URLs: Content-Type must start with video/ (e.g. video/mp4, video/quicktime)

A mismatch (e.g. an image_url pointing at an application/octet-stream) returns 400 invalid_request with detail url_content_type_mismatch. Set the Content-Type on your CDN or origin correctly — most CDNs do this automatically based on the file extension.

4xx / 5xx upstream

If the URL returns a non-2xx status, the upstream fetch failure surfaces as a 502 with the provider-error envelope. The code name still reads chat_provider_unknown_error even on the embeddings surface today — that’s tracked for a v1.1 rename to a surface-agnostic provider_request_invalid. For now, both surfaces share the same code:

{
  "error": {
    "type": "server_error",
    "code": "chat_provider_unknown_error",
    "message": "Failed to fetch image at https://example.com/missing.jpg: upstream returned 404 Not Found",
    "doc_url": "https://docs.aurous-labs.com/errors#chat_provider_unknown_error",
    "request_id": "req_..."
  }
}

This catches:

404 (asset moved / not yet uploaded)
403 (access control denied us)
5xx (origin down)
TLS errors (expired cert, hostname mismatch)
DNS resolution failure

In all cases, the platform does NOT bill the request — no hold is committed.

Recommended URL hygiene

For reliable embedding pipelines:

Upload to a CDN with stable URLs. R2 + Cloudflare, S3 + CloudFront, Bunny CDN. Avoid direct origin hosting on a single VM.
Use signed URLs with a short TTL. A 1-hour expiry is fine — we fetch immediately on request.
Set Cache-Control: public, max-age=3600 on the CDN response — lets the CDN edge-cache the asset, which makes our fetch fast on the second call.
Use image/webp or video/mp4 — those are universally supported.
Trim images before upload. A 4096×4096 PNG resized to 1024×1024 cuts our per-image visual token count by ~16× with negligible semantic loss.

Where to next?

Multimodal embeddings — the full content-parts surface
Embedding limits — caps on parts, characters, URLs
Error codes — full taxonomy
POST /v1/embeddings — endpoint reference

Get started

Guides

Concepts

API Reference

Resources

URL fetching for image_url

The fetch is server-side

Allowed schemes

Blocked address ranges

Timeout

Size limits

Content-type validation

4xx / 5xx upstream

Recommended URL hygiene

Where to next?

​The fetch is server-side

​Allowed schemes

​Blocked address ranges

​Timeout

​Size limits

​Content-type validation

​4xx / 5xx upstream

​Recommended URL hygiene

​Where to next?

The fetch is server-side

Allowed schemes

Blocked address ranges

Timeout

Size limits

Content-type validation

4xx / 5xx upstream

Recommended URL hygiene

Where to next?