Available Models

59+ AI Models at Your Fingertips

OpenAI-compatible API — switch models instantly by changing a single field.

59+ Models Updated Daily

Credit Multipliers Standard Models (16) Frontier Models (43) Models API Sending Requests

Credit Multiplier System

Every request consumes credits equal to tokens processed multiplied by the model's credit multiplier. The base rate is 1.0×, ranging from 0.5× for lightweight models up to 3.5× for large frontier models.

Multiplier	Tier	Description
0.5×	Standard — Lightweight	Small, ultra-fast models ideal for simple tasks
1.0×	Frontier — Embedding	Lightweight embedding models
1.5×	Standard — Mid	Mid-size models with strong performance
2.5×	Frontier — Mid	Balanced frontier models for quality and cost
3.5×	Frontier — Large	Highest-performance, largest-scale models

Example: A request consuming 1,000 tokens on a 1.5× model deducts 1,500 credits. Monitor exact consumption via the usage field in every response.

Standard Models

Models optimized for fast performance and efficient credit usage. Base rate is 1 credit per token adjusted by the model multiplier.

llama3.2:3b

Lightweight 3B parameter model for fast inference

chat Small

0.5× credits per 1K tokens

Context: 16K tokens

llava:7b

Large Language and Vision Assistant

vision Medium

1.0× credits per 1K tokens

Context: 4K tokens

smollm2:135m

Tiny 135M parameter model for quick tasks

chat Small

0.5× credits per 1K tokens

Context: 8K tokens

qwen2.5-coder:14b

Code-focused model with 14B parameters

code Medium

1.5× credits per 1K tokens

Context: 33K tokens

mistral-small3.2:24b-instruct-2506-q4_K_M

High-performance 24B parameter model

chat Medium

1.5× credits per 1K tokens

Context: 33K tokens

phi3.5:mini

Small model with large context window

chat Small

0.5× credits per 1K tokens

Context: 128K tokens

gemma2:9b

Google Gemma 2 with 9B parameters

chat Medium

1.5× credits per 1K tokens

Context: 8K tokens

deepseek-coder:6.7b

Code-focused model with 6.7B parameters

code Small

1.5× credits per 1K tokens

Context: 16K tokens

nomic-embed-text

Text embedding model

embedding Small

0.5× credits per 1K tokens

Context: 8K tokens

bge-m3

Dense, sparse, and multi-quantizer embeddings

embedding Small

1.5× credits per 1K tokens

Context: 8K tokens

all-minilm

Lightweight embedding model

embedding Small

0.5× credits per 1K tokens

Context: 1K tokens

snowflake-arctic-embed

High-quality embedding model

embedding Small

1.5× credits per 1K tokens

Context: 1K tokens

mistral:7b

Original Mistral with 7B parameters

chat Medium

1.5× credits per 1K tokens

Context: 33K tokens

starcoder2:3b

Code generation model

code Small

0.5× credits per 1K tokens

Context: 16K tokens

codellama:7b

Code-specific Llama model

code Medium

0.5× credits per 1K tokens

Context: 16K tokens

qwen3:14b

Qwen 3 with 14B parameters, strong reasoning

thinking Medium

1.0× credits per 1K tokens

Context: 33K tokens

Frontier Models

Frontier models provide access to some of the most powerful AI models available, with tens or hundreds of billions of parameters. Multipliers range from 2.5× to 3.5×.

glm-4.7

vision Small

2.0× credits per 1K tokens

qwen3-next:80b

Qwen 3 with 14B parameters strong reasoning

thinking Medium

1.0× credits per 1K tokens

Context: 33K tokens

glm-4.7-flash

Zhipu AI multimodal model

vision Medium

2.5× credits per 1K tokens

Context: 128K tokens

qwen3-30b

Qwen 3 with 30B parameters

thinking Large

2.5× credits per 1K tokens

Context: 33K tokens

gpt-oss:20b

OpenAI open source model

chat Medium

2.5× credits per 1K tokens

Context: 128K tokens

qwen3-vl:32b

Qwen3 Vision-Language model

vision Large

3.5× credits per 1K tokens

Context: 128K tokens

qwen3-vl:235b-instruct

Qwen3 Vision-Language 235B flagship multimodal model

vision Large

3.5× credits per 1K tokens

Context: 128K tokens

qwen3.5

Qwen 3.5 with 397B parameters (MoE)

thinking Large

3.5× credits per 1K tokens

Context: 33K tokens

devstral-2:123b

Mistral Devstral 2 with 123B parameters

code Large

3.5× credits per 1K tokens

Context: 128K tokens

deepseek-v3.1:671b

DeepSeek V3.1 with 671B parameters

thinking Large

3.5× credits per 1K tokens

Context: 128K tokens

deepseek-v3.2

DeepSeek V3.2 latest version

thinking Large

3.5× credits per 1K tokens

Context: 128K tokens

llama3.2:11b

Llama 3.2 with 11B parameters

vision Medium

2.5× credits per 1K tokens

Context: 16K tokens

llama3.2:70b

Llama 3.2 with 70B parameters

chat Large

3.5× credits per 1K tokens

Context: 16K tokens

llama3.1:70b

Llama 3.1 with 70B parameters

chat Large

3.5× credits per 1K tokens

Context: 128K tokens

llama3.1:405b

Llama 3.1 with 405B parameters (MoE)

chat Large

3.5× credits per 1K tokens

Context: 128K tokens

gemma2:27b

Google Gemma 2 with 27B parameters

chat Medium

3.5× credits per 1K tokens

Context: 8K tokens

mixtral:8x7b

Mistral Mixtral 8x7B MoE

chat Medium

3.5× credits per 1K tokens

Context: 33K tokens

mixtral:8x22b

Mistral Mixtral 8x22B MoE

chat Medium

3.5× credits per 1K tokens

Context: 66K tokens

mistral-large:24b

Mistral Large with 24B parameters

chat Medium

2.5× credits per 1K tokens

Context: 33K tokens

mistral-nemo:12b

Mistral Nemo with 12B parameters

chat Medium

2.5× credits per 1K tokens

Context: 128K tokens

codestral:22b

Mistral Codestral for code generation

code Medium

2.5× credits per 1K tokens

Context: 33K tokens

deepseek-coder:33b

DeepSeek Coder with 33B parameters

code Large

3.5× credits per 1K tokens

Context: 16K tokens

deepseek-chat:671b

DeepSeek Chat with 671B parameters

thinking Large

3.5× credits per 1K tokens

Context: 128K tokens

qwen2:72b

Qwen 2 with 72B parameters

chat Large

3.5× credits per 1K tokens

Context: 33K tokens

qwen2.5:32b

Qwen 2.5 with 32B parameters

chat Large

3.5× credits per 1K tokens

Context: 128K tokens

yi:34b

01.AI Yi with 34B parameters

chat Large

3.5× credits per 1K tokens

Context: 16K tokens

deepseek-v2.5

DeepSeek V2.5 with MoE architecture

thinking Large

3.5× credits per 1K tokens

Context: 128K tokens

llama3-gradient:70b

Llama 3 with extended context

chat Large

3.5× credits per 1K tokens

Context: 262K tokens

command-r:35b

Cohere Command R for RAG

tools Large

3.5× credits per 1K tokens

Context: 128K tokens

command-r-plus:104b

Cohere Command R+ for advanced RAG

tools Large

3.5× credits per 1K tokens

Context: 128K tokens

firefunction-v2:18b

Fireworks FireFunction for function calling

tools Medium

2.5× credits per 1K tokens

Context: 8K tokens

nomic-embed:27m

Nomic Embed for text embeddings

embedding Small

1.0× credits per 1K tokens

Context: 8K tokens

gte-qwen:7m

GTE Qwen embedding model

embedding Small

1.0× credits per 1K tokens

Context: 33K tokens

bge-large:335m

BGE Large embedding model

embedding Small

2.5× credits per 1K tokens

Context: 1K tokens

e5-mistral:7b

E5 Mistral embedding model

embedding Medium

2.5× credits per 1K tokens

Context: 1K tokens

nvidia-embed:1b

NVIDIA NeMo Embedding model

embedding Small

1.0× credits per 1K tokens

Context: 1K tokens

all-minilm-l6:22m

All-MiniLM L6 v2 embedding model

embedding Small

1.0× credits per 1K tokens

Context: 1K tokens

gte-base:110m

GTE Base embedding model

embedding Small

1.0× credits per 1K tokens

Context: 1K tokens

snowflake-arctic-embed-l:335m

Snowflake Arctic Embed Large

embedding Small

2.5× credits per 1K tokens

Context: 1K tokens

bge-small:8m

BGE Small embedding model

embedding Small

1.0× credits per 1K tokens

Context: 1K tokens

minilm-l12:39m

MiniLM L12 embedding model

embedding Small

1.0× credits per 1K tokens

Context: 1K tokens

glm-5

Zhipu AI GLM-5 latest multimodal model

vision Small

3.5× credits per 1K tokens

Context: 128K tokens

kimi-k2.5

Moonshot Kimi K2.5 reasoning model

thinking Small

3.5× credits per 1K tokens

Context: 128K tokens

Models API Endpoint

Retrieve the full list of available models via the following OpenAI-compatible endpoint:

GET

GET https://llmapi.resayil.io/v1/models

bash

curl https://llmapi.resayil.io/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

json

{
  "object": "list",
  "data": [
    {
      "id": "llama3.2:3b",
      "object": "model",
      "created": 1700000000,
      "owned_by": "llm-resayil"
    },
    {
      "id": "qwen3.5:397b",
      "object": "model",
      "created": 1700000000,
      "owned_by": "llm-resayil"
    }
  ]
}

Note: The endpoint is also accessible at GET /api/v1/models. Both paths return the same list.

Sending Requests

All models share a single OpenAI-compatible endpoint. Simply change the model field to switch models:

POST

POST https://llmapi.resayil.io/v1/chat/completions

json

{
  "model": "mistral-small3.2:24b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 500,
  "stream": false
}

Every response includes a usage field showing exact token consumption:

json

"usage": {
  "prompt_tokens": 15,
  "completion_tokens": 142,
  "total_tokens": 157
}

Model Availability & Status

Full Access Across All Tiers

All subscription tiers have immediate access to all 59 models with no restrictions. The only differentiator is your available credit balance.

Model Updates

We continuously update the model catalog to include the latest and most capable models. New models appear immediately in GET /v1/models results and are ready to use.

Deprecations

If a model is deprecated, at least 30 days notice is provided along with migration guidance. Notifications are sent via email and dashboard alerts.

Related Resources

Getting Started — Your first API request
Billing & Credits — Credit consumption rates
Rate Limits — Request quotas per tier
Pricing — Subscription tiers and costs

Ready to start building?

Learn about the credit system and billing to understand costs.

Go to Billing & Credits →