Create Transcription

POST /v1/audio/transcriptions

Transcribes audio into text in the language of the audio.

Request Body

This endpoint accepts multipart/form-data.

file

required

The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25MB.

model

string

required

The model to use for transcription (e.g., whisper-1).

language

string

The language of the audio in ISO-639-1 format (e.g., en, es, fr). Providing the language improves accuracy.

prompt

string

Optional text to guide the model’s style or continue a previous transcript. Should match the audio language.

response_format

string

default:"json"

The output format. Options: json, text, srt, verbose_json, vtt.

temperature

number

default:"0"

Sampling temperature between 0 and 1. Higher values make output more random.

Response

Varies based on response_format:

JSON Response (default)

text

string

The transcribed text.

Verbose JSON Response

task

string

The task performed (transcribe).

language

string

The detected language.

duration

number

Duration of the audio in seconds.

text

string

The transcribed text.

segments

array

Array of transcript segments with timestamps.

Examples

Basic Transcription

from openai import OpenAI

client = OpenAI(
    api_key="sk-voidai-your_key_here",
    base_url="https://api.voidai.app/v1"
)

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcript.text)

With Language Hint

with open("spanish_audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"
    )

SRT Subtitles

with open("video_audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"
    )

# Save as subtitle file
with open("subtitles.srt", "w") as f:
    f.write(transcript)

Verbose JSON with Timestamps

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")
for segment in transcript.segments:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")

Response Examples

JSON Response

{
  "text": "Hello, this is a test transcription of an audio file."
}

Verbose JSON Response

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.42,
  "text": "Hello, this is a test transcription of an audio file.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test",
      "tokens": [50364, 2425, 11, 341, 307, 257, 1500],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 2.5,
      "end": 5.42,
      "text": " transcription of an audio file.",
      "tokens": [50489, 1112, 11, 295, 364, 6279, 2058],
      "temperature": 0.0,
      "avg_logprob": -0.22,
      "compression_ratio": 1.1,
      "no_speech_prob": 0.02
    }
  ]
}

SRT Response

1
00:00:00,000 --> 00:00:02,500
Hello, this is a test

2
00:00:02,500 --> 00:00:05,420
transcription of an audio file.

Tips

Specify the language

Providing the language parameter improves accuracy, especially for non-English audio or audio with accents.

Use prompts for context

The prompt parameter can help with proper nouns, technical terms, or specific formatting expectations.

Choose the right format

Use srt or vtt for subtitles, verbose_json when you need timestamps, or plain text for simple transcripts.

Chat

Images

Audio

Video

Embeddings

Moderations

Models

Discounts

Create Transcription

Request Body

Response

JSON Response (default)

Verbose JSON Response

Examples

Basic Transcription

With Language Hint

SRT Subtitles

Verbose JSON with Timestamps

Response Examples

JSON Response

Verbose JSON Response

SRT Response

Tips

Chat

Images

Audio

Video

Embeddings

Moderations

Models

Discounts

​Request Body

​Response

​JSON Response (default)

​Verbose JSON Response

​Examples

​Basic Transcription

​With Language Hint

​SRT Subtitles

​Verbose JSON with Timestamps

​Response Examples

​JSON Response

​Verbose JSON Response

​SRT Response

​Tips

Request Body

Response

JSON Response (default)

Verbose JSON Response

Examples

Basic Transcription

With Language Hint

SRT Subtitles

Verbose JSON with Timestamps

Response Examples

JSON Response

Verbose JSON Response

SRT Response

Tips