Create Speech

POST /v1/audio/speech

Generates audio from text input using text-to-speech models.

Request Body

model

string

required

The TTS model to use. Options include tts-1 and tts-1-hd.

input

string

required

The text to generate audio for. Maximum length is 4096 characters.

voice

string

required

The voice to use. Supported voices: alloy, echo, fable, onyx, nova, shimmer.

response_format

string

default:"mp3"

The audio format. Supported formats: mp3, opus, aac, flac, wav, pcm.

speed

number

default:"1.0"

The speed of the generated audio. Range: 0.25 to 4.0.

Response

Returns the audio file content in the requested format. The response has the appropriate Content-Type header based on the format.

Examples

Basic Text-to-Speech

from openai import OpenAI

client = OpenAI(
    api_key="sk-voidai-your_key_here",
    base_url="https://api.voidai.app/v1"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello! This is a test of the text-to-speech API."
)

response.stream_to_file("output.mp3")

High-Definition Audio

response = client.audio.speech.create(
    model="tts-1-hd",
    voice="nova",
    input="Welcome to VoidAI! Experience the power of unified AI APIs.",
    response_format="flac"
)

response.stream_to_file("output.flac")

Adjusting Speed

# Slower speech (0.5x speed)
response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="This is spoken more slowly for clarity.",
    speed=0.5
)

# Faster speech (1.5x speed)
response = client.audio.speech.create(
    model="tts-1",
    voice="onyx",
    input="This is spoken more quickly.",
    speed=1.5
)

Voice Descriptions

Voice	Description
`alloy`	Neutral, balanced voice
`echo`	Warm, conversational voice
`fable`	Expressive, narrative voice
`onyx`	Deep, authoritative voice
`nova`	Friendly, energetic voice
`shimmer`	Clear, refined voice

Audio Format Comparison

Format	Quality	File Size	Use Case
`mp3`	Good	Small	General use, web streaming
`opus`	Excellent	Small	Real-time streaming
`aac`	Good	Small	Mobile apps
`flac`	Lossless	Large	Archival, high quality
`wav`	Lossless	Large	Professional editing
`pcm`	Raw	Large	Audio processing

Use tts-1 for lower latency and tts-1-hd for higher quality audio generation.

Chat

Images

Audio

Video

Embeddings

Moderations

Models

Discounts

Request Body

Response

Examples

Basic Text-to-Speech

High-Definition Audio

Adjusting Speed

Voice Descriptions

Audio Format Comparison

Chat

Images

Audio

Video

Embeddings

Moderations

Models

Discounts

​Request Body

​Response

​Examples

​Basic Text-to-Speech

​High-Definition Audio

​Adjusting Speed

​Voice Descriptions

​Audio Format Comparison

Request Body

Response

Examples

Basic Text-to-Speech

High-Definition Audio

Adjusting Speed

Voice Descriptions

Audio Format Comparison