Skip to main content
POST /v1/audio/transcriptions
Transcribes audio into text in the language of the audio.

Request Body

This endpoint accepts multipart/form-data.
file
file
required
The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25MB.
model
string
required
The model to use for transcription (e.g., whisper-1).
language
string
The language of the audio in ISO-639-1 format (e.g., en, es, fr). Providing the language improves accuracy.
prompt
string
Optional text to guide the model’s style or continue a previous transcript. Should match the audio language.
response_format
string
default:"json"
The output format. Options: json, text, srt, verbose_json, vtt.
temperature
number
default:"0"
Sampling temperature between 0 and 1. Higher values make output more random.

Response

Varies based on response_format:

JSON Response (default)

text
string
The transcribed text.

Verbose JSON Response

task
string
The task performed (transcribe).
language
string
The detected language.
duration
number
Duration of the audio in seconds.
text
string
The transcribed text.
segments
array
Array of transcript segments with timestamps.

Examples

Basic Transcription

from openai import OpenAI

client = OpenAI(
    api_key="sk-voidai-your_key_here",
    base_url="https://api.voidai.app/v1"
)

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcript.text)

With Language Hint

with open("spanish_audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"
    )

SRT Subtitles

with open("video_audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"
    )

# Save as subtitle file
with open("subtitles.srt", "w") as f:
    f.write(transcript)

Verbose JSON with Timestamps

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")
for segment in transcript.segments:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")

Response Examples

JSON Response

{
  "text": "Hello, this is a test transcription of an audio file."
}

Verbose JSON Response

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.42,
  "text": "Hello, this is a test transcription of an audio file.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test",
      "tokens": [50364, 2425, 11, 341, 307, 257, 1500],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 2.5,
      "end": 5.42,
      "text": " transcription of an audio file.",
      "tokens": [50489, 1112, 11, 295, 364, 6279, 2058],
      "temperature": 0.0,
      "avg_logprob": -0.22,
      "compression_ratio": 1.1,
      "no_speech_prob": 0.02
    }
  ]
}

SRT Response

1
00:00:00,000 --> 00:00:02,500
Hello, this is a test

2
00:00:02,500 --> 00:00:05,420
transcription of an audio file.

Tips

Providing the language parameter improves accuracy, especially for non-English audio or audio with accents.
The prompt parameter can help with proper nouns, technical terms, or specific formatting expectations.
Use srt or vtt for subtitles, verbose_json when you need timestamps, or plain text for simple transcripts.