LLM Resayil

Rate Limiting Overview

LLM Resayil applies rate limits to prevent abuse and ensure fair access for all users. Limits are enforced per user ID per minute via Laravel RateLimiter.

Why Rate Limits?

Fair Usage: Prevents any single account from monopolizing resources
Service Stability: Protects the infrastructure from being overwhelmed
Cost Control: Helps you avoid accidentally consuming excessive credits
Spam Prevention: Reduces abuse and malicious usage patterns

How Limits Are Applied

Limits are calculated in UTC time. Your quota resets at the beginning of each minute. When you exceed a limit, you'll receive a 429 Too Many Requests response and must wait for the quota to reset before retrying.

Note: Admin users automatically bypass rate limits. Contact support if you need higher limits for legitimate use cases.

Tier-Based Rate Limits

Your rate limits depend on your subscription tier. Here is a breakdown of limits for each tier:

Tier	Requests/Min	Requests/Day	Max Tokens/Request
Basic	10	—	—
Pro	30	—	—
Enterprise	60	—	—

Handling Rate Limit Responses

When you exceed your rate limit, you'll receive an HTTP 429 Too Many Requests response.

429 Response Format

The response includes a retry_after field indicating how many seconds to wait before retrying:

HTTP / JSON

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1741305600

{
  "error": {
    "message": "Rate limit exceeded",
    "code": 429
  },
  "retry_after": 45
}

Rate Limit Response Headers

These HTTP headers are returned on every response so you can monitor usage and avoid hitting the limit proactively:

Header	Description
X-RateLimit-Limit	Your maximum requests allowed per minute (e.g., 20)
X-RateLimit-Remaining	Requests remaining in the current minute window
X-RateLimit-Reset	Unix timestamp when the quota window resets
retry_after	Seconds to wait before retrying (in the JSON response body)

Monitoring Remaining Quota

Monitor X-RateLimit-Remaining on every response to implement proactive client-side throttling:

JavaScript

const response = await fetch(apiUrl, options);
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'), 10);
const limit = parseInt(response.headers.get('X-RateLimit-Limit'), 10);

if (remaining < limit * 0.2) {
  console.warn(`Only ${remaining} requests remaining in this window!`);
}

Retry-After Guidance

When you receive a 429 response, use the retry_after value from the JSON body to determine how long to wait. Never retry immediately — always wait at least the specified number of seconds.

Python

import time
import requests

def call_with_retry(url, payload, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            data = response.json()
            retry_after = data.get("retry_after", 60)
            print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
            time.sleep(retry_after)
            continue

        response.raise_for_status()

    raise Exception("Max retries exceeded")

Implementing Backoff Strategies

When rate limited, implement exponential backoff with jitter to gracefully retry requests. This is more reliable than immediate retries and helps distribute load across clients.

Exponential Backoff with Jitter

The recommended approach uses exponential backoff with jitter to avoid thundering herd problems:

Python

import time
import random

def make_api_call_with_backoff(api_url, data, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = requests.post(api_url, json=data, timeout=10)
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                # Use retry_after from body if available, else exponential backoff
                body = response.json()
                wait_time = body.get("retry_after") or (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait_time:.1f} seconds...")
                time.sleep(wait_time)
            else:
                response.raise_for_status()
        except Exception as e:
            if attempt < max_retries - 1:
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")

Backoff Strategy Details

Start Small: Begin with 1 second, then 2s, 4s, 8s, etc.
Add Jitter: Add random 0–1 second to prevent synchronized retries from multiple clients
Cap Max Wait: Do not exceed 60 seconds to avoid indefinite delays
Set Retry Limit: Set a maximum retry count (typically 5–10 attempts)

Best Practices for Rate Limit Management

1. Batch Requests

Combine multiple requests into a single API call when possible. This counts as one request toward your quota while processing more data.

2. Implement Client-Side Rate Limiting

Do not rely solely on server-side limiting. Implement client-side throttling to stay below 80% of your quota:

JavaScript

// Limit to 80% of max requests to stay safe
const MAX_SAFE_RATE = 0.8;
const maxRequestsPerMinute = 20; // Your tier limit
const safeRate = maxRequestsPerMinute * MAX_SAFE_RATE; // 16 req/min
const delayBetweenRequests = 60000 / safeRate; // ~3750ms

3. Cache Responses When Possible

Cache API responses to avoid repeated requests for the same queries. This dramatically reduces API usage and keeps you well within rate limits.

4. Stagger High-Volume Work

Spread requests over time rather than sending them all at once. This prevents burst limit violations while maintaining consistent throughput.

5. Monitor and Alert

Set up alerts when X-RateLimit-Remaining drops below 20% of your limit. This gives you early warning to take action before you start receiving 429 errors.

6. Upgrade When Needed

If your application legitimately needs higher rate limits, upgrade your subscription tier or contact support about enterprise options.

Related Resources

Billing & Credits — Token consumption and costs
Error Codes — Understanding HTTP status codes
Pricing — Subscription tiers and rates

Rate Limits & Quotas