Skip to main content
POST /v1/audio/speech
Generates audio from text input using text-to-speech models.

Request Body

model
string
required
The TTS model to use. Options include tts-1 and tts-1-hd.
input
string
required
The text to generate audio for. Maximum length is 4096 characters.
voice
string
required
The voice to use. Supported voices: alloy, echo, fable, onyx, nova, shimmer.
response_format
string
default:"mp3"
The audio format. Supported formats: mp3, opus, aac, flac, wav, pcm.
speed
number
default:"1.0"
The speed of the generated audio. Range: 0.25 to 4.0.

Response

Returns the audio file content in the requested format. The response has the appropriate Content-Type header based on the format.

Examples

Basic Text-to-Speech

from openai import OpenAI

client = OpenAI(
    api_key="sk-voidai-your_key_here",
    base_url="https://api.voidai.app/v1"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello! This is a test of the text-to-speech API."
)

response.stream_to_file("output.mp3")

High-Definition Audio

response = client.audio.speech.create(
    model="tts-1-hd",
    voice="nova",
    input="Welcome to VoidAI! Experience the power of unified AI APIs.",
    response_format="flac"
)

response.stream_to_file("output.flac")

Adjusting Speed

# Slower speech (0.5x speed)
response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="This is spoken more slowly for clarity.",
    speed=0.5
)

# Faster speech (1.5x speed)
response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="This is spoken more quickly.",
    speed=1.5
)

Voice Descriptions

VoiceDescription
alloyNeutral, balanced voice
echoWarm, conversational voice
fableExpressive, narrative voice
onyxDeep, authoritative voice
novaFriendly, energetic voice
shimmerClear, refined voice

Audio Format Comparison

FormatQualityFile SizeUse Case
mp3GoodSmallGeneral use, web streaming
opusExcellentSmallReal-time streaming
aacGoodSmallMobile apps
flacLosslessLargeArchival, high quality
wavLosslessLargeProfessional editing
pcmRawLargeAudio processing
Use tts-1 for lower latency and tts-1-hd for higher quality audio generation.