Rate Limits
Rate limits protect the API from abuse and ensure fair usage across all accounts. Limits vary by plan and endpoint type.
Rate Limit Tiers
Each plan has different limits for requests per minute, daily messages, REPLR count, and knowledge documents. Upgrade your plan to increase your quotas.
| Plan | Requests/min | Messages/day | REPLRs | Knowledge Docs |
|---|---|---|---|---|
| Free | 30 | 50 | 3 | 5 |
| Plus | 120 | 500 | 25 | 50 |
| Pro | 300 | 2,000 | 100 | 200 |
| Creator | 600 | 10,000 | Unlimited | 500 |
Need higher limits? Upgrade your plan or contact us for custom enterprise quotas.
Rate Limit Headers
Every API response includes headers that tell you where you stand within the current rate limit window. Use these to proactively manage your request flow.
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum number of requests allowed in the current window. |
X-RateLimit-Remaining | Number of requests remaining in the current window. |
X-RateLimit-Reset | Unix timestamp (seconds) when the current window resets. |
Example response headers
HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 117
X-RateLimit-Reset: 1741521600429 Too Many Requests
When you exceed the rate limit, the API responds with a 429 status code. The response body includes a retry_after field indicating how many seconds to wait before retrying. The same value is also available in the Retry-After response header.
{
"error": "rate_limit_exceeded",
"message": "You have exceeded the rate limit of 120 requests per minute. Please wait before retrying.",
"retry_after": 32
}Do not retry immediately. Always respect the retry_after value. Continuously hitting the rate limit may result in longer back-off periods or temporary suspension.
Best Practices
Follow these guidelines to avoid hitting rate limits and build a resilient integration.
- •Implement exponential backoff. When a request fails with 429, wait for the retry_after period, then double the delay on each subsequent retry up to a maximum cap.
- •Cache responses where possible. REPLR metadata, conversation lists, and configuration data change infrequently. Cache them locally to reduce redundant API calls.
- •Use webhooks instead of polling. Instead of repeatedly polling for new messages, configure webhooks to receive real-time push notifications when events occur.
- •Batch operations when available. Some endpoints support batch requests. Use them to combine multiple operations into a single API call.
- •Monitor X-RateLimit-Remaining. Track the remaining quota from response headers and throttle your requests proactively before hitting the limit.
Retry Logic Examples
Here is a reusable pattern for handling rate limits with exponential backoff. The function retries up to 3 times, respecting the retry_after value from the response.
async function fetchWithRetry(url, options = {}, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(url, options);
if (res.status !== 429) return res;
// Parse retry delay from header or body
const retryAfter = res.headers.get("Retry-After");
const body = await res.json();
const delay = (retryAfter ? parseInt(retryAfter) : body.retry_after) || 1;
// Exponential backoff: retry_after * 2^attempt
const backoff = delay * Math.pow(2, attempt) * 1000;
console.warn(`Rate limited. Retrying in ${backoff / 1000}s...`);
await new Promise((resolve) => setTimeout(resolve, backoff));
}
throw new Error("Max retries exceeded");
}
// Usage
const res = await fetchWithRetry("https://api.replr.ai/v1/replrs", {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const data = await res.json();Endpoint-Specific Limits
Some resource-intensive endpoints have lower rate limits than the per-plan defaults shown above. These limits apply regardless of your plan unless noted otherwise.
| Endpoint | Free | Pro+ |
|---|---|---|
| Voice TTS | 100/min | 500/min |
| File uploads | 10/min | 10/min |
| Search | 60/min | 60/min |
Endpoint-specific limits are tracked independently from the global per-minute limit. A request can fail with 429 even if you have remaining global quota.