Request Body
This endpoint acceptsmultipart/form-data.
The audio file to transcribe. Supported formats:
flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25MB.The model to use for transcription (e.g.,
whisper-1).The language of the audio in ISO-639-1 format (e.g.,
en, es, fr). Providing the language improves accuracy.Optional text to guide the model’s style or continue a previous transcript. Should match the audio language.
The output format. Options:
json, text, srt, verbose_json, vtt.Sampling temperature between 0 and 1. Higher values make output more random.
Response
Varies based onresponse_format:
JSON Response (default)
The transcribed text.
Verbose JSON Response
The task performed (
transcribe).The detected language.
Duration of the audio in seconds.
The transcribed text.
Array of transcript segments with timestamps.
Examples
Basic Transcription
With Language Hint
SRT Subtitles
Verbose JSON with Timestamps
Response Examples
JSON Response
Verbose JSON Response
SRT Response
Tips
Specify the language
Specify the language
Providing the
language parameter improves accuracy, especially for non-English audio or audio with accents.Use prompts for context
Use prompts for context
The
prompt parameter can help with proper nouns, technical terms, or specific formatting expectations.Choose the right format
Choose the right format
Use
srt or vtt for subtitles, verbose_json when you need timestamps, or plain text for simple transcripts.