Available Models

59+ AI Models at Your Fingertips

OpenAI-compatible API — switch models instantly by changing a single field.

59+ Models Updated Daily
Home Documentation Available Models

Credit Multiplier System

Every request consumes credits equal to tokens processed multiplied by the model's credit multiplier. The base rate is 1.0×, ranging from 0.5× for lightweight models up to 3.5× for large frontier models.

Multiplier Tier Description
0.5×
Standard — Lightweight Small, ultra-fast models ideal for simple tasks
1.0×
Frontier — Embedding Lightweight embedding models
1.5×
Standard — Mid Mid-size models with strong performance
2.5×
Frontier — Mid Balanced frontier models for quality and cost
3.5×
Frontier — Large Highest-performance, largest-scale models

Example: A request consuming 1,000 tokens on a 1.5× model deducts 1,500 credits. Monitor exact consumption via the usage field in every response.

Standard Models

16

Models optimized for fast performance and efficient credit usage. Base rate is 1 credit per token adjusted by the model multiplier.

llama3.2:3b
Lightweight 3B parameter model for fast inference
chat Small
0.5× credits per 1K tokens
Context: 16K tokens
llava:7b
Large Language and Vision Assistant
vision Medium
1.0× credits per 1K tokens
Context: 4K tokens
smollm2:135m
Tiny 135M parameter model for quick tasks
chat Small
0.5× credits per 1K tokens
Context: 8K tokens
qwen2.5-coder:14b
Code-focused model with 14B parameters
code Medium
1.5× credits per 1K tokens
Context: 33K tokens
mistral-small3.2:24b-instruct-2506-q4_K_M
High-performance 24B parameter model
chat Medium
1.5× credits per 1K tokens
Context: 33K tokens
phi3.5:mini
Small model with large context window
chat Small
0.5× credits per 1K tokens
Context: 128K tokens
gemma2:9b
Google Gemma 2 with 9B parameters
chat Medium
1.5× credits per 1K tokens
Context: 8K tokens
deepseek-coder:6.7b
Code-focused model with 6.7B parameters
code Small
1.5× credits per 1K tokens
Context: 16K tokens
nomic-embed-text
Text embedding model
embedding Small
0.5× credits per 1K tokens
Context: 8K tokens
bge-m3
Dense, sparse, and multi-quantizer embeddings
embedding Small
1.5× credits per 1K tokens
Context: 8K tokens
all-minilm
Lightweight embedding model
embedding Small
0.5× credits per 1K tokens
Context: 1K tokens
snowflake-arctic-embed
High-quality embedding model
embedding Small
1.5× credits per 1K tokens
Context: 1K tokens
mistral:7b
Original Mistral with 7B parameters
chat Medium
1.5× credits per 1K tokens
Context: 33K tokens
starcoder2:3b
Code generation model
code Small
0.5× credits per 1K tokens
Context: 16K tokens
codellama:7b
Code-specific Llama model
code Medium
0.5× credits per 1K tokens
Context: 16K tokens
qwen3:14b
Qwen 3 with 14B parameters, strong reasoning
thinking Medium
1.0× credits per 1K tokens
Context: 33K tokens

Frontier Models

43

Frontier models provide access to some of the most powerful AI models available, with tens or hundreds of billions of parameters. Multipliers range from 2.5× to 3.5×.

glm-4.7
vision Small
2.0× credits per 1K tokens
qwen3-next:80b
Qwen 3 with 14B parameters strong reasoning
thinking Medium
1.0× credits per 1K tokens
Context: 33K tokens
glm-4.7-flash
Zhipu AI multimodal model
vision Medium
2.5× credits per 1K tokens
Context: 128K tokens
qwen3-30b
Qwen 3 with 30B parameters
thinking Large
2.5× credits per 1K tokens
Context: 33K tokens
gpt-oss:20b
OpenAI open source model
chat Medium
2.5× credits per 1K tokens
Context: 128K tokens
qwen3-vl:32b
Qwen3 Vision-Language model
vision Large
3.5× credits per 1K tokens
Context: 128K tokens
qwen3-vl:235b-instruct
Qwen3 Vision-Language 235B flagship multimodal model
vision Large
3.5× credits per 1K tokens
Context: 128K tokens
qwen3.5
Qwen 3.5 with 397B parameters (MoE)
thinking Large
3.5× credits per 1K tokens
Context: 33K tokens
devstral-2:123b
Mistral Devstral 2 with 123B parameters
code Large
3.5× credits per 1K tokens
Context: 128K tokens
deepseek-v3.1:671b
DeepSeek V3.1 with 671B parameters
thinking Large
3.5× credits per 1K tokens
Context: 128K tokens
deepseek-v3.2
DeepSeek V3.2 latest version
thinking Large
3.5× credits per 1K tokens
Context: 128K tokens
llama3.2:11b
Llama 3.2 with 11B parameters
vision Medium
2.5× credits per 1K tokens
Context: 16K tokens
llama3.2:70b
Llama 3.2 with 70B parameters
chat Large
3.5× credits per 1K tokens
Context: 16K tokens
llama3.1:70b
Llama 3.1 with 70B parameters
chat Large
3.5× credits per 1K tokens
Context: 128K tokens
llama3.1:405b
Llama 3.1 with 405B parameters (MoE)
chat Large
3.5× credits per 1K tokens
Context: 128K tokens
gemma2:27b
Google Gemma 2 with 27B parameters
chat Medium
3.5× credits per 1K tokens
Context: 8K tokens
mixtral:8x7b
Mistral Mixtral 8x7B MoE
chat Medium
3.5× credits per 1K tokens
Context: 33K tokens
mixtral:8x22b
Mistral Mixtral 8x22B MoE
chat Medium
3.5× credits per 1K tokens
Context: 66K tokens
mistral-large:24b
Mistral Large with 24B parameters
chat Medium
2.5× credits per 1K tokens
Context: 33K tokens
mistral-nemo:12b
Mistral Nemo with 12B parameters
chat Medium
2.5× credits per 1K tokens
Context: 128K tokens
codestral:22b
Mistral Codestral for code generation
code Medium
2.5× credits per 1K tokens
Context: 33K tokens
deepseek-coder:33b
DeepSeek Coder with 33B parameters
code Large
3.5× credits per 1K tokens
Context: 16K tokens
deepseek-chat:671b
DeepSeek Chat with 671B parameters
thinking Large
3.5× credits per 1K tokens
Context: 128K tokens
qwen2:72b
Qwen 2 with 72B parameters
chat Large
3.5× credits per 1K tokens
Context: 33K tokens
qwen2.5:32b
Qwen 2.5 with 32B parameters
chat Large
3.5× credits per 1K tokens
Context: 128K tokens
yi:34b
01.AI Yi with 34B parameters
chat Large
3.5× credits per 1K tokens
Context: 16K tokens
deepseek-v2.5
DeepSeek V2.5 with MoE architecture
thinking Large
3.5× credits per 1K tokens
Context: 128K tokens
llama3-gradient:70b
Llama 3 with extended context
chat Large
3.5× credits per 1K tokens
Context: 262K tokens
command-r:35b
Cohere Command R for RAG
tools Large
3.5× credits per 1K tokens
Context: 128K tokens
command-r-plus:104b
Cohere Command R+ for advanced RAG
tools Large
3.5× credits per 1K tokens
Context: 128K tokens
firefunction-v2:18b
Fireworks FireFunction for function calling
tools Medium
2.5× credits per 1K tokens
Context: 8K tokens
nomic-embed:27m
Nomic Embed for text embeddings
embedding Small
1.0× credits per 1K tokens
Context: 8K tokens
gte-qwen:7m
GTE Qwen embedding model
embedding Small
1.0× credits per 1K tokens
Context: 33K tokens
bge-large:335m
BGE Large embedding model
embedding Small
2.5× credits per 1K tokens
Context: 1K tokens
e5-mistral:7b
E5 Mistral embedding model
embedding Medium
2.5× credits per 1K tokens
Context: 1K tokens
nvidia-embed:1b
NVIDIA NeMo Embedding model
embedding Small
1.0× credits per 1K tokens
Context: 1K tokens
all-minilm-l6:22m
All-MiniLM L6 v2 embedding model
embedding Small
1.0× credits per 1K tokens
Context: 1K tokens
gte-base:110m
GTE Base embedding model
embedding Small
1.0× credits per 1K tokens
Context: 1K tokens
snowflake-arctic-embed-l:335m
Snowflake Arctic Embed Large
embedding Small
2.5× credits per 1K tokens
Context: 1K tokens
bge-small:8m
BGE Small embedding model
embedding Small
1.0× credits per 1K tokens
Context: 1K tokens
minilm-l12:39m
MiniLM L12 embedding model
embedding Small
1.0× credits per 1K tokens
Context: 1K tokens
glm-5
Zhipu AI GLM-5 latest multimodal model
vision Small
3.5× credits per 1K tokens
Context: 128K tokens
kimi-k2.5
Moonshot Kimi K2.5 reasoning model
thinking Small
3.5× credits per 1K tokens
Context: 128K tokens

Models API Endpoint

Retrieve the full list of available models via the following OpenAI-compatible endpoint:

GET
GET https://llmapi.resayil.io/v1/models
bash
curl https://llmapi.resayil.io/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
json
{
  "object": "list",
  "data": [
    {
      "id": "llama3.2:3b",
      "object": "model",
      "created": 1700000000,
      "owned_by": "llm-resayil"
    },
    {
      "id": "qwen3.5:397b",
      "object": "model",
      "created": 1700000000,
      "owned_by": "llm-resayil"
    }
  ]
}

Note: The endpoint is also accessible at GET /api/v1/models. Both paths return the same list.

Sending Requests

All models share a single OpenAI-compatible endpoint. Simply change the model field to switch models:

POST
POST https://llmapi.resayil.io/v1/chat/completions
json
{
  "model": "mistral-small3.2:24b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 500,
  "stream": false
}

Every response includes a usage field showing exact token consumption:

json
"usage": {
  "prompt_tokens": 15,
  "completion_tokens": 142,
  "total_tokens": 157
}

Model Availability & Status

Full Access Across All Tiers

All subscription tiers have immediate access to all 59 models with no restrictions. The only differentiator is your available credit balance.

Model Updates

We continuously update the model catalog to include the latest and most capable models. New models appear immediately in GET /v1/models results and are ready to use.

Deprecations

If a model is deprecated, at least 30 days notice is provided along with migration guidance. Notifications are sent via email and dashboard alerts.

Related Resources

Ready to start building?

Learn about the credit system and billing to understand costs.

Go to Billing & Credits →