Chat Completions API

The /v1/chat/completions endpoint allows you to interact with advanced large language models (LLMs) through a unified API, supporting both OpenAI-compatible and Anthropic-compatible models. This endpoint supports conversational AI, function calling, streaming, and multimodal (text+image) input, depending on model capabilities.

Endpoint

POST https://api.llm.vin/v1/chat/completions

Authentication

API Key (optional):
You may provide an API key via the Authorization: Bearer ... header.
- Authenticated users may have access to additional models or higher rate limits.
- Unauthenticated requests are allowed, but may have restricted model access.

Request Format

{
  "model": "grok-3-mini",
  "messages": [
    {
      "role": "user",
      "content": "Write a one-sentence bedtime story about a unicorn."
    }
  ],
  "temperature": 1.0,
  "max_tokens": 256,
  "stream": false,
  "tools": [],
  "tool_choice": null,
  "stop": null
}

Required Parameters

Parameter	Type	Description
`model`	string	The ID of the model to use for chat. See Available Models.
`messages`	array	List of message objects representing the conversation history. Each message must have a `role` (`user`, `assistant`, or `system`) and `content`.

Optional Parameters

Parameter	Type	Default	Description
`temperature`	number	1.0	Sampling temperature to use (higher values = more random).
`max_tokens`	integer	256	Maximum number of tokens to generate in the response.
`stream`	boolean	false	If true, response will be sent as a stream of data chunks (see Streaming).
`tools`	array	[]	List of tool definitions for function calling (if supported by the model).
`tool_choice`	string/object/null	null	Specify which tool to use if multiple are provided.
`stop`	string/array/null	null	Sequences where the API will stop generating further tokens.

Multimodal Input

If the model supports image_input, you may include images in the messages array as follows:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What is in this image?" },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
  ]
}

Response Format

{
  "id": "chatcmpl-1716151540",
  "object": "chat.completion",
  "created": 1716151540,
  "model": "grok-3-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Once upon a time, a unicorn danced across the stars and wished you sweet dreams."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 22,
    "total_tokens": 40
  }
}

id: Unique identifier for the chat completion request.
object: Type of object returned.
created: Unix timestamp of creation.
model: Model ID used for the completion.
choices: Array of response choices, each with:
- index: Index of the choice.
- message: The assistant’s reply (role and content).
- finish_reason: Why the completion stopped (e.g., stop, length, tool_calls).
usage: Token usage statistics.

Tool Calls (Function Calling)

If the model supports function calling and a tool is invoked, the response may include a tool_calls array in the message:

"tool_calls": [
  {
    "id": "call-abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"city\": \"Paris\"}"
    }
  }
]

Streaming

If you set "stream": true, the response will be sent as a series of Server-Sent Events (SSE), with each chunk containing a partial completion:

Each chunk is a JSON object prefixed by data: and followed by two newlines.
The stream ends with data: [DONE].

Example stream chunk:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":...,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Once upon a time,"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":...,"model":"grok-3-mini","choices":[{"index":0,"delta":{"content":" a unicorn"},"finish_reason":null}]}

data: [DONE]

Available Models

Use the /v1/models endpoint to list available models and their capabilities.

Model ID	Description	Capabilities
`grok-3-mini`	Advanced conversational LLM with function calling and tool support.	chat_completions, function_calling
…	(Other models may be available depending on configuration.)	…

Only models with chat_completions capability can be used here.
Some models may support image_input and/or function_calling.

Example Requests

Basic Chat Completion

curl "https://api.llm.vin/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      {
        "role": "user",
        "content": "Write a one-sentence bedtime story about a unicorn."
      }
    ]
  }'

Streaming Chat Completion

curl "https://api.llm.vin/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ],
    "stream": true
  }'

Chat with Function Calling

curl "https://api.llm.vin/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": { "type": "string" }
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'

Multimodal (Text + Image) Chat

curl "https://api.llm.vin/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is in this image?" },
          { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
        ]
      }
    ]
  }'

Error Handling

The API returns standard HTTP status codes:

Status Code	Description
200	Success
400	Bad request (missing or invalid parameters)
401	Unauthorized (invalid or missing API key)
403	Forbidden (insufficient permissions for model)
404	Not found (invalid model)
429	Too many requests (rate limit exceeded)
500	Server error

Error responses include a JSON object:

{
  "error": {
    "message": "Model 'grok-3-mini' not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Rate Limits

Chat completions: 500 requests per day and 10 requests per minute per IP.
Other endpoints: 50,000 requests per day per IP.
Limits may be higher for authenticated users.

Notes

Use /v1/models to discover available models and their capabilities.
Function calling and tool use are only supported by models with those capabilities enabled.
Streaming responses are available by setting stream: true.
Multimodal (image) input is only supported by models with image_input capability.
If you provide an API key, you may have access to more models or higher rate limits.
All requests and errors are logged for security and debugging.

General

Endpoints

Wine Code

Examples