Chat Completions API
The /v1/chat/completions
endpoint allows you to interact with advanced large language models (LLMs) through a unified API, supporting both OpenAI-compatible and Anthropic-compatible models. This endpoint supports conversational AI, function calling, streaming, and multimodal (text+image) input, depending on model capabilities.
Endpoint
POST https://api.llm.vin/v1/chat/completions
Authentication
- API Key (optional):
You may provide an API key via theAuthorization: Bearer ...
header.- Authenticated users may have access to additional models or higher rate limits.
- Unauthenticated requests are allowed, but may have restricted model access.
Request Format
{
"model": "grok-3-mini",
"messages": [
{
"role": "user",
"content": "Write a one-sentence bedtime story about a unicorn."
}
],
"temperature": 1.0,
"max_tokens": 256,
"stream": false,
"tools": [],
"tool_choice": null,
"stop": null
}
Required Parameters
Parameter | Type | Description |
---|---|---|
model |
string | The ID of the model to use for chat. See Available Models. |
messages |
array | List of message objects representing the conversation history. Each message must have a role (user , assistant , or system ) and content . |
Optional Parameters
Parameter | Type | Default | Description |
---|---|---|---|
temperature |
number | 1.0 | Sampling temperature to use (higher values = more random). |
max_tokens |
integer | 256 | Maximum number of tokens to generate in the response. |
stream |
boolean | false | If true, response will be sent as a stream of data chunks (see Streaming). |
tools |
array | [] | List of tool definitions for function calling (if supported by the model). |
tool_choice |
string/object/null | null | Specify which tool to use if multiple are provided. |
stop |
string/array/null | null | Sequences where the API will stop generating further tokens. |
Multimodal Input
- If the model supports
image_input
, you may include images in themessages
array as follows:{ "role": "user", "content": [ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } } ] }
Response Format
{
"id": "chatcmpl-1716151540",
"object": "chat.completion",
"created": 1716151540,
"model": "grok-3-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Once upon a time, a unicorn danced across the stars and wished you sweet dreams."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 22,
"total_tokens": 40
}
}
id
: Unique identifier for the chat completion request.object
: Type of object returned.created
: Unix timestamp of creation.model
: Model ID used for the completion.choices
: Array of response choices, each with:index
: Index of the choice.message
: The assistant’s reply (role
andcontent
).finish_reason
: Why the completion stopped (e.g.,stop
,length
,tool_calls
).
usage
: Token usage statistics.
Tool Calls (Function Calling)
If the model supports function calling and a tool is invoked, the response may include a tool_calls
array in the message:
"tool_calls": [
{
"id": "call-abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"Paris\"}"
}
}
]
Streaming
If you set "stream": true
, the response will be sent as a series of Server-Sent Events (SSE), with each chunk containing a partial completion:
- Each chunk is a JSON object prefixed by
data:
and followed by two newlines. - The stream ends with
data: [DONE]
.
Example stream chunk:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":...,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Once upon a time,"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":...,"model":"grok-3-mini","choices":[{"index":0,"delta":{"content":" a unicorn"},"finish_reason":null}]}
data: [DONE]
Available Models
Use the /v1/models
endpoint to list available models and their capabilities.
Model ID | Description | Capabilities |
---|---|---|
grok-3-mini |
Advanced conversational LLM with function calling and tool support. | chat_completions, function_calling |
… | (Other models may be available depending on configuration.) | … |
- Only models with
chat_completions
capability can be used here. - Some models may support
image_input
and/orfunction_calling
.
Example Requests
Basic Chat Completion
curl "https://api.llm.vin/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-3-mini",
"messages": [
{
"role": "user",
"content": "Write a one-sentence bedtime story about a unicorn."
}
]
}'
Streaming Chat Completion
curl "https://api.llm.vin/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-3-mini",
"messages": [
{
"role": "user",
"content": "Tell me a joke."
}
],
"stream": true
}'
Chat with Function Calling
curl "https://api.llm.vin/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-3-mini",
"messages": [
{
"role": "user",
"content": "What is the weather in Paris?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}
}
}
]
}'
Multimodal (Text + Image) Chat
curl "https://api.llm.vin/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-3-mini",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
]
}
]
}'
Error Handling
The API returns standard HTTP status codes:
Status Code | Description |
---|---|
200 | Success |
400 | Bad request (missing or invalid parameters) |
401 | Unauthorized (invalid or missing API key) |
403 | Forbidden (insufficient permissions for model) |
404 | Not found (invalid model) |
429 | Too many requests (rate limit exceeded) |
500 | Server error |
Error responses include a JSON object:
{
"error": {
"message": "Model 'grok-3-mini' not found",
"type": "invalid_request_error",
"code": "model_not_found"
}
}
Rate Limits
- Chat completions: 500 requests per day and 10 requests per minute per IP.
- Other endpoints: 50,000 requests per day per IP.
- Limits may be higher for authenticated users.
Notes
- Use
/v1/models
to discover available models and their capabilities. - Function calling and tool use are only supported by models with those capabilities enabled.
- Streaming responses are available by setting
stream: true
. - Multimodal (image) input is only supported by models with
image_input
capability. - If you provide an API key, you may have access to more models or higher rate limits.
- All requests and errors are logged for security and debugging.