Skip to main content
POST /v1/chat/completions
Creates a model response for the given chat conversation. Supports streaming, function calling, and multiple AI providers through a unified interface.

Request Body

model
string
required
ID of the model to use (e.g., gpt-4, claude-3-5-sonnet, gemini-2.0-flash)
messages
array
required
A list of messages comprising the conversation so far.
stream
boolean
default:"false"
If set to true, partial message deltas will be sent as server-sent events.
stream_options
object
Options for streaming responses.
temperature
number
default:"1"
Sampling temperature between 0 and 2. Higher values make output more random.
max_tokens
integer
Maximum number of tokens to generate in the completion.
max_completion_tokens
integer
An upper bound for the number of tokens that can be generated.
stop
string | array
Up to 4 sequences where the API will stop generating tokens.
presence_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize new tokens based on presence in text.
frequency_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize new tokens based on frequency in text.
tools
array
A list of tools the model may call.
tool_choice
string | object
Controls which tool is called. auto lets the model decide, none prevents tool calls, or specify a tool.
parallel_tool_calls
boolean
default:"true"
Whether to enable parallel function calling during tool use.
response_format
object
Specify the output format.
reasoning_effort
string
For reasoning models, controls the effort level: low, medium, or high.

Response

id
string
A unique identifier for the chat completion.
object
string
The object type, always chat.completion.
created
integer
Unix timestamp of when the completion was created.
model
string
The model used for completion.
choices
array
A list of chat completion choices.
usage
object
Token usage statistics.

Examples

Basic Completion

from openai import OpenAI

client = OpenAI(
    api_key="sk-voidai-your_key_here",
    base_url="https://api.voidai.app/v1"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Vision (Multimodal)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg",
                        "detail": "high"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

JSON Mode

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in JSON."},
        {"role": "user", "content": "List 3 programming languages with their year of creation"}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)

Response Example

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1701691200,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Response

When stream: true, responses are sent as server-sent events:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: [DONE]