POST /v1/chat/completions
Creates a model response for the given chat conversation. Supports streaming, function calling, and multiple AI providers through a unified interface.
Request Body
ID of the model to use (e.g., gpt-4, claude-3-5-sonnet, gemini-2.0-flash)
A list of messages comprising the conversation so far. The role of the message author. One of system, user, assistant, tool, or function.
The contents of the message. Can be a string or array of content parts for multimodal input.
An optional name for the participant.
Tool calls generated by the model (for assistant messages).
The ID of the tool call this message is responding to (for tool messages).
If set to true, partial message deltas will be sent as server-sent events.
Options for streaming responses. If set, includes usage statistics in the final streamed chunk.
Sampling temperature between 0 and 2. Higher values make output more random.
Maximum number of tokens to generate in the completion.
An upper bound for the number of tokens that can be generated.
Up to 4 sequences where the API will stop generating tokens.
Number between -2.0 and 2.0. Positive values penalize new tokens based on presence in text.
Number between -2.0 and 2.0. Positive values penalize new tokens based on frequency in text.
A list of tools the model may call. The type of tool. Currently only function is supported.
The function definition including name, description, and parameters schema.
Controls which tool is called. auto lets the model decide, none prevents tool calls, or specify a tool.
Whether to enable parallel function calling during tool use.
Specify the output format. Either text (default) or json_object for JSON mode.
For reasoning models, controls the effort level: low, medium, or high.
Response
A unique identifier for the chat completion.
The object type, always chat.completion.
Unix timestamp of when the completion was created.
The model used for completion.
A list of chat completion choices. The index of this choice.
The reason the model stopped generating: stop, length, tool_calls, etc.
Token usage statistics. Tokens in the completion.
Examples
Basic Completion
from openai import OpenAI
client = OpenAI(
api_key = "sk-voidai-your_key_here" ,
base_url = "https://api.voidai.app/v1"
)
response = client.chat.completions.create(
model = "gpt-4" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "What is the capital of France?" }
]
)
print (response.choices[ 0 ].message.content)
Streaming
stream = client.chat.completions.create(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Tell me a short story" }],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Function Calling
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get the current weather in a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "The city name"
},
"unit" : {
"type" : "string" ,
"enum" : [ "celsius" , "fahrenheit" ]
}
},
"required" : [ "location" ]
}
}
}
]
response = client.chat.completions.create(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Tokyo?" }],
tools = tools,
tool_choice = "auto"
)
# Check if the model wants to call a function
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Arguments: { tool_call.function.arguments } " )
Vision (Multimodal)
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "What's in this image?" },
{
"type" : "image_url" ,
"image_url" : {
"url" : "https://example.com/image.jpg" ,
"detail" : "high"
}
}
]
}
]
)
print (response.choices[ 0 ].message.content)
JSON Mode
response = client.chat.completions.create(
model = "gpt-4" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant that responds in JSON." },
{ "role" : "user" , "content" : "List 3 programming languages with their year of creation" }
],
response_format = { "type" : "json_object" }
)
import json
data = json.loads(response.choices[ 0 ].message.content)
print (data)
Response Example
{
"id" : "chatcmpl-abc123" ,
"object" : "chat.completion" ,
"created" : 1701691200 ,
"model" : "gpt-4" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "The capital of France is Paris."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 25 ,
"completion_tokens" : 8 ,
"total_tokens" : 33
}
}
Streaming Response
When stream: true, responses are sent as server-sent events:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]