Audio Generation
Provider Disclosure: VoidAI offers audio generation services powered by multiple providers, including OpenAI. The specific provider used depends on the model you select in your API call.
VoidAI supports audio output directly from the chat completions API, leveraging advanced technology from our provider partners. This allows you to generate spoken audio responses from various models. Audio capabilities let you:
- Generate spoken audio summaries of text (text in, audio out)
- Create voice responses for conversational AI applications
- Develop multi-modal applications with both text and audio output
Quickstart
To generate audio, you can use the chat completions endpoint with our Python SDK:
from openai import OpenAI
import base64
client = OpenAI(api_key="yourapikey", base_url="https://api.voidai.app/v1")
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Is a golden retriever a good family dog?"
}
]
)
print(completion.choices[0].message.content)
# Save the audio response
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("response.wav", "wb") as f:
f.write(wav_bytes)
Supported Voices
The audio generation supports several voice options:
alloy
- A neutral voice with balanced toneecho
- A deeper, more authoritative voicefable
- A soft, friendly voice with warmthnova
- A professional, clear voiceshimmer
- A bright, energetic voice
Supported Audio Formats
You can request audio output in the following formats:
wav
- High quality uncompressed audiomp3
- Compressed audio with good quality and smaller file sizeopus
- Optimized for voice applications with low bandwidth
Continuing Conversations
You can build conversation chains that include audio by referencing previous audio responses:
from openai import OpenAI
import base64
client = OpenAI(api_key="yourapikey", base_url="https://api.voidai.app/v1")
# Initial request with audio response
first_response = client.chat.completions.create(
model="gpt-4o",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Tell me about quantum computing"
}
]
)
# Save first response ID and content
audio_id = first_response.choices[0].message.audio.id
first_text = first_response.choices[0].message.content
# Continue the conversation
second_response = client.chat.completions.create(
model="gpt-4o",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Tell me about quantum computing"
},
{
"role": "assistant",
"content": first_text,
"audio": {
"id": audio_id
}
},
{
"role": "user",
"content": "What are the practical applications?"
}
]
)
print(second_response.choices[0].message.content)
Limitations and Considerations
- Audio generation may increase response times compared to text-only outputs
- Each model has different capabilities for generating audio
- For the highest quality text-to-speech conversion for pre-determined text, consider using the dedicated TTS API