OpenAI · Schema

CreateTranscriptionRequest

AIArtificial IntelligenceLarge Language ModelsT1

Properties

Name	Type	Description
file	string	The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 25 MB.
model	string	ID of the model to use. Only whisper-1 and gpt-4o-transcribe are currently available.
language	string	The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
prompt	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
response_format	string	The format of the transcript output. Defaults to json. verbose_json includes additional metadata like word-level timestamps.
temperature	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
timestamp_granularities	array	The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use this parameter.

View JSON Schema on GitHub

JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "CreateTranscriptionRequest",
  "type": "object",
  "properties": {
    "file": {
      "type": "string",
      "description": "The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 25 MB."
    },
    "model": {
      "type": "string",
      "description": "ID of the model to use. Only whisper-1 and gpt-4o-transcribe are currently available."
    },
    "language": {
      "type": "string",
      "description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency."
    },
    "prompt": {
      "type": "string",
      "description": "An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language."
    },
    "response_format": {
      "type": "string",
      "description": "The format of the transcript output. Defaults to json. verbose_json includes additional metadata like word-level timestamps."
    },
    "temperature": {
      "type": "number",
      "description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic."
    },
    "timestamp_granularities": {
      "type": "array",
      "description": "The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use this parameter."
    }
  }
}