Rate Limits & Quotas

Understand how LLM Resayil enforces rate limits and quotas to ensure fair usage and service stability. Learn how to handle rate limit responses and implement backoff strategies.

Rate Limiting Overview

LLM Resayil applies rate limits to prevent abuse and ensure fair access for all users. Limits are enforced per user ID per minute via Laravel RateLimiter.

Why Rate Limits?

  • Fair Usage: Prevents any single account from monopolizing resources
  • Service Stability: Protects the infrastructure from being overwhelmed
  • Cost Control: Helps you avoid accidentally consuming excessive credits
  • Spam Prevention: Reduces abuse and malicious usage patterns

How Limits Are Applied

Limits are calculated in UTC time. Your quota resets at the beginning of each minute. When you exceed a limit, you'll receive a 429 Too Many Requests response and must wait for the quota to reset before retrying.

Note: Admin users automatically bypass rate limits. Contact support if you need higher limits for legitimate use cases.

Tier-Based Rate Limits

Your rate limits depend on your subscription tier. Here is a breakdown of limits for each tier:

Tier Requests/Min Requests/Day Max Tokens/Request
Basic 10
Pro 30
Enterprise 60

Handling Rate Limit Responses

When you exceed your rate limit, you'll receive an HTTP 429 Too Many Requests response.

429 Response Format

The response includes a retry_after field indicating how many seconds to wait before retrying:

HTTP / JSON
HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 20 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1741305600 { "error": { "message": "Rate limit exceeded", "code": 429 }, "retry_after": 45 }

Rate Limit Response Headers

These HTTP headers are returned on every response so you can monitor usage and avoid hitting the limit proactively:

Header Description
X-RateLimit-Limit Your maximum requests allowed per minute (e.g., 20)
X-RateLimit-Remaining Requests remaining in the current minute window
X-RateLimit-Reset Unix timestamp when the quota window resets
retry_after Seconds to wait before retrying (in the JSON response body)

Monitoring Remaining Quota

Monitor X-RateLimit-Remaining on every response to implement proactive client-side throttling:

JavaScript
const response = await fetch(apiUrl, options); const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'), 10); const limit = parseInt(response.headers.get('X-RateLimit-Limit'), 10); if (remaining < limit * 0.2) { console.warn(`Only ${remaining} requests remaining in this window!`); }

Retry-After Guidance

When you receive a 429 response, use the retry_after value from the JSON body to determine how long to wait. Never retry immediately — always wait at least the specified number of seconds.

Python
import time import requests def call_with_retry(url, payload, headers, max_retries=5): for attempt in range(max_retries): response = requests.post(url, json=payload, headers=headers) if response.status_code == 200: return response.json() if response.status_code == 429: data = response.json() retry_after = data.get("retry_after", 60) print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})") time.sleep(retry_after) continue response.raise_for_status() raise Exception("Max retries exceeded")

Implementing Backoff Strategies

When rate limited, implement exponential backoff with jitter to gracefully retry requests. This is more reliable than immediate retries and helps distribute load across clients.

Exponential Backoff with Jitter

The recommended approach uses exponential backoff with jitter to avoid thundering herd problems:

Python
import time import random def make_api_call_with_backoff(api_url, data, max_retries=5): for attempt in range(max_retries): try: response = requests.post(api_url, json=data, timeout=10) if response.status_code == 200: return response.json() elif response.status_code == 429: # Use retry_after from body if available, else exponential backoff body = response.json() wait_time = body.get("retry_after") or (2 ** attempt) + random.uniform(0, 1) print(f"Rate limited. Waiting {wait_time:.1f} seconds...") time.sleep(wait_time) else: response.raise_for_status() except Exception as e: if attempt < max_retries - 1: wait_time = (2 ** attempt) + random.uniform(0, 1) time.sleep(wait_time) else: raise raise Exception("Max retries exceeded")

Backoff Strategy Details

  • Start Small: Begin with 1 second, then 2s, 4s, 8s, etc.
  • Add Jitter: Add random 0–1 second to prevent synchronized retries from multiple clients
  • Cap Max Wait: Do not exceed 60 seconds to avoid indefinite delays
  • Set Retry Limit: Set a maximum retry count (typically 5–10 attempts)

Best Practices for Rate Limit Management

1. Batch Requests

Combine multiple requests into a single API call when possible. This counts as one request toward your quota while processing more data.

2. Implement Client-Side Rate Limiting

Do not rely solely on server-side limiting. Implement client-side throttling to stay below 80% of your quota:

JavaScript
// Limit to 80% of max requests to stay safe const MAX_SAFE_RATE = 0.8; const maxRequestsPerMinute = 20; // Your tier limit const safeRate = maxRequestsPerMinute * MAX_SAFE_RATE; // 16 req/min const delayBetweenRequests = 60000 / safeRate; // ~3750ms

3. Cache Responses When Possible

Cache API responses to avoid repeated requests for the same queries. This dramatically reduces API usage and keeps you well within rate limits.

4. Stagger High-Volume Work

Spread requests over time rather than sending them all at once. This prevents burst limit violations while maintaining consistent throughput.

5. Monitor and Alert

Set up alerts when X-RateLimit-Remaining drops below 20% of your limit. This gives you early warning to take action before you start receiving 429 errors.

6. Upgrade When Needed

If your application legitimately needs higher rate limits, upgrade your subscription tier or contact support about enterprise options.

Related Resources

Need more help?

Learn about common errors and how to troubleshoot them.

Go to Error Codes & Troubleshooting →