Limits

Two axes constrain a single call: how big it is (per-request size, PRS) and how often you make it (requests per minute, RPM). Both axes return a 4xx with a structured error.code when they fire — no surprise 5xx, no silent truncation.

Per-request size (PRS)

Class	Endpoint	Cap
READS	scroll · search · retrieve	≤ 1000 points/call · ≤ 10 MB/response
WRITES	upsert	≤ 10 K vectors/call · streaming
	payload edits · batch delete	≤ 512 points/call

Upserts stream — there is no body-size cap on the public upsert URL. The 10 K vectors/call is a defensive request-level ceiling, not a body limit; one call can upload millions of bytes via byte-target chunking on the receiving end.

For bulk reads, use the SDK's scroll_iter() / scrollIter() — pages transparently and stays within both quotas. See the Python and JS SDK READMEs for the recipe.

Requests per minute (RPM)

A sliding-window minutely cap derived from plans.requests_per_minute on your subscription tier. The exact number depends on the tier; your current value is visible in the dashboard under Billing → Plan.

When the limit fires, the response is HTTP 429 with error.code = "RATE_LIMITED" and a Retry-After header (seconds). Both SDKs surface this as RateLimitExceededError with a structured retry_after field on the exception.

When each axis fires

PRS fires synchronously on the request that violates it — body parsed, request rejected with HTTP 400 (validation) or 413 (oversized body) and a VALIDATION_ERROR / RESPONSE_TOO_LARGE code.
RPMfires when your sliding-window count exceeds the tier's value, regardless of the size of any individual call. Read and write traffic share the same minutely budget unless your tier exposes split quotas.

Quick Start — get your first API key and run a query SDK Reference — full API documentation Pricing — see your tier's RPM and storage allowance

Limits

Per-request size (PRS)

Requests per minute (RPM)

When each axis fires

Related