Rate Limits

The Inception Agents API enforces rate limits to ensure fair usage and platform stability. Limits vary by plan and endpoint.

Rate Limit Headers

Every API response includes headers indicating your current rate limit status:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp (seconds) when the window resets

Example response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 9742
X-RateLimit-Reset: 1740700800

Per-Plan Limits

General API Limits

Plan	Requests / Hour	Requests / Day	Concurrent Requests
Free	1,000	10,000	10
Pro	10,000	100,000	50
Enterprise	Custom	Custom	Custom

Endpoint-Specific Limits

Certain endpoints have additional per-endpoint rate limits that apply independently from the general limits:

Endpoint	Free	Pro	Notes
`/api/v1/ingestion/crawl`	10/hour	50/hour	Each call initiates an async job
`/api/v1/knowledge/synthesize`	100/hour	1,000/hour	Consumes LLM credits per request
`/api/v1/analytics/events`	10,000/hour	10,000/hour	High-throughput for edge workers

The /api/v1/analytics/events endpoint has an elevated limit across all plans to support real-time event ingestion from edge workers without throttling.

Burst Allowance

All plans include a burst allowance of up to 2x the hourly limit for short bursts lasting less than 10 seconds. This accommodates traffic spikes from batch operations or concurrent edge worker deployments.

For example, on the Pro plan (10,000 requests/hour), you can briefly send up to 20,000 requests in a 10-second window without being rate limited. Sustained traffic above the hourly limit will trigger rate limiting.

429 Too Many Requests

When you exceed the rate limit, the API returns a 429 status code:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1740700800
Retry-After: 60

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Retry after 60 seconds.",
  "retryAfter": 60,
  "statusCode": 429
}

The retryAfter field and Retry-After header indicate the number of seconds to wait before retrying.

Handling Rate Limits

Exponential Backoff

Implement exponential backoff with jitter to avoid thundering herd problems when rate limited:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    if (attempt === maxRetries) {
      throw new Error("Rate limit exceeded after maximum retries");
    }

    const retryAfter = parseInt(response.headers.get("Retry-After") || "1", 10);
    const baseDelay = retryAfter * 1000;
    const jitter = Math.random() * 1000;
    const delay = baseDelay + jitter;

    console.warn(`Rate limited. Retrying in ${Math.round(delay)}ms (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((resolve) => setTimeout(resolve, delay));
  }
}

import time
import random
import requests

def fetch_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries + 1):
        response = requests.get(url, headers=headers)

        if response.status_code != 429:
            return response

        if attempt == max_retries:
            raise Exception("Rate limit exceeded after maximum retries")

        retry_after = int(response.headers.get("Retry-After", "1"))
        base_delay = retry_after
        jitter = random.uniform(0, 1)
        delay = base_delay + jitter

        print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(delay)

Proactive Rate Limit Checking

Check the X-RateLimit-Remaining header before making additional requests to avoid hitting the limit:

async function checkRateLimit(response) {
  const remaining = parseInt(response.headers.get("X-RateLimit-Remaining") || "0", 10);
  const resetAt = parseInt(response.headers.get("X-RateLimit-Reset") || "0", 10);

  if (remaining < 10) {
    const waitMs = (resetAt * 1000) - Date.now();
    if (waitMs > 0) {
      console.warn(`Rate limit nearly exhausted. ${remaining} remaining. Waiting ${waitMs}ms.`);
      await new Promise((resolve) => setTimeout(resolve, waitMs));
    }
  }
}

Best Practices

Cache responses. Cache GET responses locally to reduce redundant API calls. Knowledge search results and analytics summaries are good candidates for caching with a TTL of 5-15 minutes.

Use bulk operations. When recording multiple analytics events, batch them into fewer requests where possible rather than sending one request per event.

Implement backoff. Always implement exponential backoff with jitter when retrying after a 429 response. Do not retry immediately or at fixed intervals.

Monitor usage. Track your current rate limit consumption using the response headers. Set up alerts when X-RateLimit-Remaining drops below a threshold.

Check Dashboard usage. View your current API usage and rate limit status in Dashboard > Settings > Usage. This shows hourly and daily consumption broken down by endpoint.

Upgrade when needed. If you consistently hit rate limits, consider upgrading to a higher plan. Enterprise customers can negotiate custom limits for specific endpoints.

Rate Limits by Authentication Method

Auth Method	Rate Limit Pool	Notes
API Key	Per key	Each API key has its own rate limit window
Supabase JWT	Per user	Dashboard users share the tenant’s pool

If your integration uses multiple API keys, each key has its own independent rate limit window. This can be useful for distributing load across multiple edge worker deployments.