Limits

Two axes constrain a single call: how big it is (per-request size, PRS) and how often you make it (requests per minute, RPM). Both axes return a 4xx with a structured error.code when they fire — no surprise 5xx, no silent truncation.

Per-request size (PRS)

ClassEndpointCap
READSscroll · search · retrieve≤ 1000 points/call · ≤ 10 MB/response
WRITESupsert≤ 10 K vectors/call · streaming
payload edits · batch delete≤ 512 points/call

Upserts stream — there is no body-size cap on the public upsert URL. The 10 K vectors/call is a defensive request-level ceiling, not a body limit; one call can upload millions of bytes via byte-target chunking on the receiving end.

For bulk reads, use the SDK's scroll_iter() / scrollIter() — pages transparently and stays within both quotas. See the Python and JS SDK READMEs for the recipe.

Requests per minute (RPM)

A sliding-window minutely cap derived from plans.requests_per_minute on your subscription tier. The exact number depends on the tier; your current value is visible in the dashboard under Billing → Plan.

When the limit fires, the response is HTTP 429 with error.code = "RATE_LIMITED" and a Retry-After header (seconds). Both SDKs surface this as RateLimitExceededError with a structured retry_after field on the exception.

When each axis fires

  • PRS fires synchronously on the request that violates it — body parsed, request rejected with HTTP 400 (validation) or 413 (oversized body) and a VALIDATION_ERROR / RESPONSE_TOO_LARGE code.
  • RPMfires when your sliding-window count exceeds the tier's value, regardless of the size of any individual call. Read and write traffic share the same minutely budget unless your tier exposes split quotas.