Properties
| Name | Type | Description |
|---|---|---|
| file | string | The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 25 MB. |
| model | string | ID of the model to use. Only whisper-1 and gpt-4o-transcribe are currently available. |
| language | string | The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. |
| prompt | string | An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. |
| response_format | string | The format of the transcript output. Defaults to json. verbose_json includes additional metadata like word-level timestamps. |
| temperature | number | The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
| timestamp_granularities | array | The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use this parameter. |
JSON Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "CreateTranscriptionRequest",
"type": "object",
"properties": {
"file": {
"type": "string",
"description": "The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 25 MB."
},
"model": {
"type": "string",
"description": "ID of the model to use. Only whisper-1 and gpt-4o-transcribe are currently available."
},
"language": {
"type": "string",
"description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency."
},
"prompt": {
"type": "string",
"description": "An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language."
},
"response_format": {
"type": "string",
"description": "The format of the transcript output. Defaults to json. verbose_json includes additional metadata like word-level timestamps."
},
"temperature": {
"type": "number",
"description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic."
},
"timestamp_granularities": {
"type": "array",
"description": "The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use this parameter."
}
}
}