API Reference

Rate Limits

API rate limits by plan and how to handle rate limiting.

Rate Limits

The Inception Agents API enforces rate limits to ensure fair usage and platform stability. Limits vary by plan and endpoint.


Rate Limit Headers

Every API response includes headers indicating your current rate limit status:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp (seconds) when the window resets

Example response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 9742
X-RateLimit-Reset: 1740700800

Per-Plan Limits

General API Limits

PlanRequests / HourRequests / DayConcurrent Requests
Free1,00010,00010
Pro10,000100,00050
EnterpriseCustomCustomCustom

Endpoint-Specific Limits

Certain endpoints have additional per-endpoint rate limits that apply independently from the general limits:

EndpointFreeProNotes
/api/v1/ingestion/crawl10/hour50/hourEach call initiates an async job
/api/v1/knowledge/synthesize100/hour1,000/hourConsumes LLM credits per request
/api/v1/analytics/events10,000/hour10,000/hourHigh-throughput for edge workers

The /api/v1/analytics/events endpoint has an elevated limit across all plans to support real-time event ingestion from edge workers without throttling.


Burst Allowance

All plans include a burst allowance of up to 2x the hourly limit for short bursts lasting less than 10 seconds. This accommodates traffic spikes from batch operations or concurrent edge worker deployments.

For example, on the Pro plan (10,000 requests/hour), you can briefly send up to 20,000 requests in a 10-second window without being rate limited. Sustained traffic above the hourly limit will trigger rate limiting.


429 Too Many Requests

When you exceed the rate limit, the API returns a 429 status code:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1740700800
Retry-After: 60
{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Retry after 60 seconds.",
  "retryAfter": 60,
  "statusCode": 429
}

The retryAfter field and Retry-After header indicate the number of seconds to wait before retrying.


Handling Rate Limits

Exponential Backoff

Implement exponential backoff with jitter to avoid thundering herd problems when rate limited:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    if (attempt === maxRetries) {
      throw new Error("Rate limit exceeded after maximum retries");
    }

    const retryAfter = parseInt(response.headers.get("Retry-After") || "1", 10);
    const baseDelay = retryAfter * 1000;
    const jitter = Math.random() * 1000;
    const delay = baseDelay + jitter;

    console.warn(`Rate limited. Retrying in ${Math.round(delay)}ms (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((resolve) => setTimeout(resolve, delay));
  }
}
import time
import random
import requests

def fetch_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries + 1):
        response = requests.get(url, headers=headers)

        if response.status_code != 429:
            return response

        if attempt == max_retries:
            raise Exception("Rate limit exceeded after maximum retries")

        retry_after = int(response.headers.get("Retry-After", "1"))
        base_delay = retry_after
        jitter = random.uniform(0, 1)
        delay = base_delay + jitter

        print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(delay)

Proactive Rate Limit Checking

Check the X-RateLimit-Remaining header before making additional requests to avoid hitting the limit:

async function checkRateLimit(response) {
  const remaining = parseInt(response.headers.get("X-RateLimit-Remaining") || "0", 10);
  const resetAt = parseInt(response.headers.get("X-RateLimit-Reset") || "0", 10);

  if (remaining < 10) {
    const waitMs = (resetAt * 1000) - Date.now();
    if (waitMs > 0) {
      console.warn(`Rate limit nearly exhausted. ${remaining} remaining. Waiting ${waitMs}ms.`);
      await new Promise((resolve) => setTimeout(resolve, waitMs));
    }
  }
}

Best Practices

Cache responses. Cache GET responses locally to reduce redundant API calls. Knowledge search results and analytics summaries are good candidates for caching with a TTL of 5-15 minutes.

Use bulk operations. When recording multiple analytics events, batch them into fewer requests where possible rather than sending one request per event.

Implement backoff. Always implement exponential backoff with jitter when retrying after a 429 response. Do not retry immediately or at fixed intervals.

Monitor usage. Track your current rate limit consumption using the response headers. Set up alerts when X-RateLimit-Remaining drops below a threshold.

Check Dashboard usage. View your current API usage and rate limit status in Dashboard > Settings > Usage. This shows hourly and daily consumption broken down by endpoint.

Upgrade when needed. If you consistently hit rate limits, consider upgrading to a higher plan. Enterprise customers can negotiate custom limits for specific endpoints.


Rate Limits by Authentication Method

Auth MethodRate Limit PoolNotes
API KeyPer keyEach API key has its own rate limit window
Supabase JWTPer userDashboard users share the tenant’s pool

If your integration uses multiple API keys, each key has its own independent rate limit window. This can be useful for distributing load across multiple edge worker deployments.