Rate Limits
The Inception Agents API enforces rate limits to ensure fair usage and platform stability. Limits vary by plan and endpoint.
Rate Limit Headers
Every API response includes headers indicating your current rate limit status:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp (seconds) when the window resets |
Example response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 9742
X-RateLimit-Reset: 1740700800
Per-Plan Limits
General API Limits
| Plan | Requests / Hour | Requests / Day | Concurrent Requests |
|---|---|---|---|
| Free | 1,000 | 10,000 | 10 |
| Pro | 10,000 | 100,000 | 50 |
| Enterprise | Custom | Custom | Custom |
Endpoint-Specific Limits
Certain endpoints have additional per-endpoint rate limits that apply independently from the general limits:
| Endpoint | Free | Pro | Notes |
|---|---|---|---|
/api/v1/ingestion/crawl | 10/hour | 50/hour | Each call initiates an async job |
/api/v1/knowledge/synthesize | 100/hour | 1,000/hour | Consumes LLM credits per request |
/api/v1/analytics/events | 10,000/hour | 10,000/hour | High-throughput for edge workers |
The /api/v1/analytics/events endpoint has an elevated limit across all plans to support real-time event ingestion from edge workers without throttling.
Burst Allowance
All plans include a burst allowance of up to 2x the hourly limit for short bursts lasting less than 10 seconds. This accommodates traffic spikes from batch operations or concurrent edge worker deployments.
For example, on the Pro plan (10,000 requests/hour), you can briefly send up to 20,000 requests in a 10-second window without being rate limited. Sustained traffic above the hourly limit will trigger rate limiting.
429 Too Many Requests
When you exceed the rate limit, the API returns a 429 status code:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1740700800
Retry-After: 60
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Retry after 60 seconds.",
"retryAfter": 60,
"statusCode": 429
}
The retryAfter field and Retry-After header indicate the number of seconds to wait before retrying.
Handling Rate Limits
Exponential Backoff
Implement exponential backoff with jitter to avoid thundering herd problems when rate limited:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
if (attempt === maxRetries) {
throw new Error("Rate limit exceeded after maximum retries");
}
const retryAfter = parseInt(response.headers.get("Retry-After") || "1", 10);
const baseDelay = retryAfter * 1000;
const jitter = Math.random() * 1000;
const delay = baseDelay + jitter;
console.warn(`Rate limited. Retrying in ${Math.round(delay)}ms (attempt ${attempt + 1}/${maxRetries})`);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
import time
import random
import requests
def fetch_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries + 1):
response = requests.get(url, headers=headers)
if response.status_code != 429:
return response
if attempt == max_retries:
raise Exception("Rate limit exceeded after maximum retries")
retry_after = int(response.headers.get("Retry-After", "1"))
base_delay = retry_after
jitter = random.uniform(0, 1)
delay = base_delay + jitter
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
time.sleep(delay)
Proactive Rate Limit Checking
Check the X-RateLimit-Remaining header before making additional requests to avoid hitting the limit:
async function checkRateLimit(response) {
const remaining = parseInt(response.headers.get("X-RateLimit-Remaining") || "0", 10);
const resetAt = parseInt(response.headers.get("X-RateLimit-Reset") || "0", 10);
if (remaining < 10) {
const waitMs = (resetAt * 1000) - Date.now();
if (waitMs > 0) {
console.warn(`Rate limit nearly exhausted. ${remaining} remaining. Waiting ${waitMs}ms.`);
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
}
}
Best Practices
Cache responses. Cache GET responses locally to reduce redundant API calls. Knowledge search results and analytics summaries are good candidates for caching with a TTL of 5-15 minutes.
Use bulk operations. When recording multiple analytics events, batch them into fewer requests where possible rather than sending one request per event.
Implement backoff. Always implement exponential backoff with jitter when retrying after a 429 response. Do not retry immediately or at fixed intervals.
Monitor usage. Track your current rate limit consumption using the response headers. Set up alerts when X-RateLimit-Remaining drops below a threshold.
Check Dashboard usage. View your current API usage and rate limit status in Dashboard > Settings > Usage. This shows hourly and daily consumption broken down by endpoint.
Upgrade when needed. If you consistently hit rate limits, consider upgrading to a higher plan. Enterprise customers can negotiate custom limits for specific endpoints.
Rate Limits by Authentication Method
| Auth Method | Rate Limit Pool | Notes |
|---|---|---|
| API Key | Per key | Each API key has its own rate limit window |
| Supabase JWT | Per user | Dashboard users share the tenant’s pool |
If your integration uses multiple API keys, each key has its own independent rate limit window. This can be useful for distributing load across multiple edge worker deployments.
Inception Agents