Limits
Two axes constrain a single call: how big it is (per-request size, PRS) and how often you make it (requests per minute, RPM). Both axes return a 4xx with a structured error.code when they fire — no surprise 5xx, no silent truncation.
Per-request size (PRS)
| Class | Endpoint | Cap |
|---|---|---|
| READS | scroll · search · retrieve | ≤ 1000 points/call · ≤ 10 MB/response |
| WRITES | upsert | ≤ 10 K vectors/call · streaming |
| payload edits · batch delete | ≤ 512 points/call |
Upserts stream — there is no body-size cap on the public upsert URL. The 10 K vectors/call is a defensive request-level ceiling, not a body limit; one call can upload millions of bytes via byte-target chunking on the receiving end.
For bulk reads, use the SDK's scroll_iter() / scrollIter() — pages transparently and stays within both quotas. See the Python and JS SDK READMEs for the recipe.
Requests per minute (RPM)
A sliding-window minutely cap derived from plans.requests_per_minute on your subscription tier. The exact number depends on the tier; your current value is visible in the dashboard under Billing → Plan.
When the limit fires, the response is HTTP 429 with error.code = "RATE_LIMITED" and a Retry-After header (seconds). Both SDKs surface this as RateLimitExceededError with a structured retry_after field on the exception.
When each axis fires
- PRS fires synchronously on the request that violates it — body parsed, request rejected with HTTP 400 (validation) or 413 (oversized body) and a
VALIDATION_ERROR/RESPONSE_TOO_LARGEcode. - RPMfires when your sliding-window count exceeds the tier's value, regardless of the size of any individual call. Read and write traffic share the same minutely budget unless your tier exposes split quotas.