Usage & Analytics

Learn how to monitor your credit consumption, understand how tokens are counted, and optimize your requests to get the most out of your balance.

Checking Your Current Balance

Available Balance
842.50
credits — updated after every request
GET /api/billing/subscription

You can programmatically query your current credit balance at any time using the following endpoint:

HTTP Request
GET https://llmapi.resayil.io/api/billing/subscription Authorization: Bearer YOUR_API_KEY

A successful request returns a JSON response containing your subscription details and remaining credit balance:

JSON Response
{ "tier": "free", "status": "active", "expires_at": null, "credits": 842.50 }

The credits field represents your currently available balance. Credits are deducted with each request based on token count and the model used.

How Tokens Are Counted

The system measures your consumption in tokens. One token is approximately four characters in English, or roughly three-quarters of a word. Every request counts two types of tokens:

  • prompt_tokens — Tokens in the message(s) sent to the model, including system prompts and conversation history.
  • completion_tokens — Tokens in the response generated by the model.
  • total_tokens — The sum of prompt_tokens + completion_tokens. This is the figure used for credit deduction.

These values appear in the usage field of every response:

JSON — usage field
{ "usage": { "prompt_tokens": 42, "completion_tokens": 118, "total_tokens": 160 } }

Credit Deduction Formula

Tokens are not deducted at a flat rate across all models. The system applies a per-model multiplier that reflects its operational cost:

total_tokens token count
×
model_multiplier model multiplier
=
credits_deducted credits deducted
Model Type Multiplier Range Examples
Standard models 0.5× – 1.5× Mistral, Llama 3, Neural Chat
Premium models 2× – 3.5× GPT-4o, Claude 3.5, Gemini Pro

Example: A request with 200 total tokens using a standard model at 1× multiplier deducts 200 credits. The same request through a premium model at 3× deducts 600 credits.

Note: You can find the exact multiplier for each model on the Available Models page.

Streaming vs Non-Streaming Token Counting

LLM Resayil supports both full-response (non-streaming) and real-time streaming (stream: true) modes. Token counting is identical in both modes — the only difference is how you receive the data:

  • Non-streaming: The full response arrives in a single message and always includes the usage field.
  • Streaming: The response arrives as incremental chunks via Server-Sent Events. The usage field appears in the final chunk before the [DONE] signal.

Credit deduction occurs after the full response is complete in both cases. The total token count is not affected by the delivery method.

API Key Last-Used Tracking

The system maintains a last_used_at timestamp for each API key, updated with per-minute granularity to minimise database write pressure. You can see this value on the API Keys page in your dashboard.

This indicator is useful for spotting unusual activity or verifying that your application is using the correct key. If you see a key that has not been used in a long time, consider revoking it and generating a fresh one.

Tips for Optimizing Credit Usage

Follow these practices to reduce your per-request cost without sacrificing quality:

  1. Keep system prompts concise: System messages are counted in every request. Write them clearly but briefly.
  2. Use standard models for simple tasks: Standard models with a 0.5×–1.5× multiplier are well-suited for most general tasks and cost significantly less than premium models.
  3. Limit max_tokens: Setting a sensible max_tokens value prevents the model from generating unnecessarily long responses.
  4. Trim conversation history: The more historical messages you include in each request, the higher your prompt_tokens cost. Remove older turns that are no longer relevant.
  5. Reserve premium models for complex tasks: Use GPT-4o or Claude-class models only when the task genuinely demands advanced reasoning capabilities.

Tip: Benchmark different models against your tasks and compare response quality to cost. In many cases a faster, cheaper standard model performs the job just as well.

Need more credits?

Learn how to purchase additional top-up credit packs.

Go to Top-Up Credits →