Vapi · Schema

Vapi Assistant

JSON Schema for the Vapi Assistant resource as returned by the Vapi API.

AIVoice AIVoice AgentsConversational AITelephonyReal-TimeTranscriptionText-to-SpeechLLMAgentsMCP

Properties

Name	Type	Description
transcriber	object	These are the options for the assistant's transcriber.
model	object	These are the options for the assistant's LLM.
voice	object	These are the options for the assistant's voice.
firstMessage	string	This is the first message that the assistant will say. This can also be a URL to a containerized audio file (mp3, wav, etc.). If unspecified, assistant will wait for user to speak and use the model to
firstMessageInterruptionsEnabled	boolean
firstMessageMode	string	This is the mode for the first message. Default is 'assistant-speaks-first'. Use: - 'assistant-speaks-first' to have the assistant speak first. - 'assistant-waits-for-user' to have the assistant wait
voicemailDetection	object	These are the settings to configure or disable voicemail detection. Alternatively, voicemail detection can be configured using the model.tools=[VoicemailTool]. By default, voicemail detection is disab
clientMessages	array	These are the messages that will be sent to your Client SDKs. Default is conversation-update,function-call,hang,model-output,speech-update,status-update,transfer-update,transcript,tool-calls,user-inte
serverMessages	array	These are the messages that will be sent to your Server URL. Default is conversation-update,end-of-call-report,function-call,hang,speech-update,status-update,tool-calls,transfer-destination-request,ha
maxDurationSeconds	number	This is the maximum number of seconds that the call will last. When the call reaches this duration, it will be ended. @default 600 (10 minutes)
backgroundSound	object	This is the background sound in the call. Default for phone calls is 'office' and default for web calls is 'off'. You can also provide a custom sound by providing a URL to an audio file.
modelOutputInMessagesEnabled	boolean	This determines whether the model's output is used in conversation history rather than the transcription of assistant's speech. @default false
transportConfigurations	array	These are the configurations to be passed to the transport providers of assistant's calls, like Twilio. You can store multiple configurations for different transport providers. For a call, only the co
observabilityPlan	object	This is the plan for observability of assistant's calls. Currently, only Langfuse is supported.
credentials	array	These are dynamic credentials that will be used for the assistant calls. By default, all the credentials are available for use in the call but you can supplement an additional credentials using this.
hooks	array	This is a set of actions that will be performed on certain events.
name	string	This is the name of the assistant. This is required when you want to transfer between assistants in a call.
voicemailMessage	string	This is the message that the assistant will say if the call is forwarded to voicemail. If unspecified, it will hang up.
endCallMessage	string	This is the message that the assistant will say if it ends the call. If unspecified, it will hang up without saying anything.
endCallPhrases	array	This list contains phrases that, if spoken by the assistant, will trigger the call to be hung up. Case insensitive.
compliancePlan	object
metadata	object	This is for metadata you want to store on the assistant.
backgroundSpeechDenoisingPlan	object	This enables filtering of noise and background speech while the user is talking. Features: - Smart denoising using Krisp - Fourier denoising Smart denoising can be combined with or used independently
analysisPlan	object	This is the plan for analysis of assistant's calls. Stored in `call.analysis`.
artifactPlan	object	This is the plan for artifacts generated during assistant's calls. Stored in `call.artifact`.
startSpeakingPlan	object	This is the plan for when the assistant should start talking. You should configure this if you're running into these issues: - The assistant is too slow to start talking after the customer is done spe
stopSpeakingPlan	object	This is the plan for when assistant should stop talking on customer interruption. You should configure this if you're running into these issues: - The assistant is too slow to recognize customer's int
monitorPlan	object	This is the plan for real-time monitoring of the assistant's calls. Usage: - To enable live listening of the assistant's calls, set `monitorPlan.listenEnabled` to `true`. - To enable live control of t
credentialIds	array	These are the credentials that will be used for the assistant calls. By default, all the credentials are available for use in the call but you can provide a subset using this.
server	object	This is where Vapi will send webhooks. You can find all webhooks available along with their shape in ServerMessage schema. The order of precedence is: 1. assistant.server.url 2. phoneNumber.serverUrl

View JSON Schema on GitHub

JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api-evangelist.com/schemas/vapi/vapi-assistant-schema.json",
  "title": "Vapi Assistant",
  "description": "JSON Schema for the Vapi Assistant resource as returned by the Vapi API.",
  "type": "object",
  "properties": {
    "transcriber": {
      "description": "These are the options for the assistant's transcriber.",
      "oneOf": [
        {
          "type": "object",
          "properties": {
            "provider": {
              "type": "string",
              "description": "This is the transcription provider that will be used.",
              "enum": [
                "assembly-ai"
              ]
            },
            "language": {
              "type": "string",
              "description": "This is the language that will be set for the transcription.",
              "enum": [
                "multi",
                "en"
              ]
            },
            "confidenceThreshold": {
              "type": "number",
              "description": "Transcripts below this confidence threshold will be discarded.\n\n@default 0.4"
            },
            "formatTurns": {
              "type": "boolean",
              "description": "This enables formatting of transcripts.\n\n@default true"
            },
            "endOfTurnConfidenceThreshold": {
              "type": "number",
              "description": "This is the end of turn confidence threshold. The minimum confidence that the end of turn is detected.\nNote: Only used if startSpeakingPlan.smartEndpointingPlan is not set.\n@min 0\n@max 1\n@default 0.7"
            },
            "minEndOfTurnSilenceWhenConfident": {
              "type": "number",
              "description": "This is the minimum end of turn silence when confident in milliseconds.\nNote: Only used if startSpeakingPlan.smartEndpointingPlan is not set.\n@default 160"
            },
            "wordFinalizationMaxWaitTime": {
              "type": "number"
            },
            "maxTurnSilence": {
              "type": "number",
              "description": "This is the maximum turn silence time in milliseconds.\nNote: Only used if startSpeakingPlan.smartEndpointingPlan is not set.\n@default 400"
            },
            "vadAssistedEndpointingEnabled": {
              "type": "boolean",
              "description": "Use VAD to assist with endpointing decisions from the transcriber.\nWhen enabled, transcriber endpointing will be buffered if VAD detects the user is still speaking, preventing premature turn-taking.\nWhen disabled, transcriber endpointing will be used immediately regardless of VAD state, allowing for"
            },
            "speechModel": {
              "type": "string",
              "description": "This is the speech model used for the streaming session.\nNote: Keyterms prompting is not supported with multilingual streaming.\n@default 'universal-streaming-english'",
              "enum": [
                "universal-streaming-english",
                "universal-streaming-multilingual"
              ]
            },
            "realtimeUrl": {
              "type": "string",
              "description": "The WebSocket URL that the transcriber connects to."
            },
            "wordBoost": {
              "type": "array",
              "description": "Add up to 2500 characters of custom vocabulary.",
              "items": {
                "type": "object"
              }
            },
            "keytermsPrompt": {
              "type": "array",
              "description": "Keyterms prompting improves recognition accuracy for specific words and phrases.\nCan include up to 100 keyterms, each up to 50 characters.\nCosts an additional $0.04/hour when enabled.",
              "items": {
                "type": "object"
              }
            },
            "endUtteranceSilenceThreshold": {
              "type": "number",
              "description": "The duration of the end utterance silence threshold in milliseconds."
            },
            "disablePartialTranscripts": {
              "type": "boolean",
              "description": "Disable partial transcripts.\nSet to `true` to not receive partial transcripts. Defaults to `false`."
            },
            "fallbackPlan": {
              "description": "This is the plan for transcriber provider fallbacks in the event that the primary transcriber provider fails."
            }
          },
          "required": [
            "provider"
          ]
        },
        {
          "type": "object",
          "properties": {
            "provider": {
              "type": "string",
              "description": "This is the transcription provider that will be used.",
              "enum": [
                "azure"
              ]
            },
            "language": {
              "type": "string",
              "description": "This is the language that will be set for the transcription. The list of languages Azure supports can be found here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt",
              "enum": [
                "af-ZA",
                "am-ET",
                "ar-AE",
                "ar-BH",
                "ar-DZ",
                "ar-EG",
                "ar-IL",
                "ar-IQ",
                "ar-JO",
                "ar-KW"
              ]
            },
            "segmentationStrategy": {
              "type": "string",
              "description": "Controls how phrase boundaries are detected, enabling either simple time/silence heuristics or more advanced semantic segmentation.",
              "enum": [
                "Default",
                "Time",
                "Semantic"
              ]
            },
            "segmentationSilenceTimeoutMs": {
              "type": "number",
              "description": "Duration of detected silence after which the service finalizes a phrase. Configure to adjust sensitivity to pauses in speech."
            },
            "segmentationMaximumTimeMs": {
              "type": "number",
              "description": "Maximum duration a segment can reach before being cut off when using time-based segmentation."
            },
            "fallbackPlan": {
              "description": "This is the plan for transcriber provider fallbacks in the event that the primary transcriber provider fails."
            }
          },
          "required": [
            "provider"
          ]
        },
        {
          "type": "object",
          "properties": {
            "provider": {
              "type": "string",
              "description": "This is the transcription provider that will be used. Use `custom-transcriber` for providers that are not natively supported.",
              "enum": [
                "custom-transcriber"
              ]
            },
            "server": {
              "description": "This is where the transcription request will be sent.\n\nUsage:\n1. Vapi will initiate a websocket connection with `server.url`.\n\n2. Vapi will send an initial text frame with the sample rate. Format:\n```\n    {\n      \"type\": \"start\",\n      \"encoding\": \"linear16\", // 16-bit raw PCM format\n      \"containe"
            },
            "fallbackPlan": {
              "description": "This is the plan for transcriber provider fallbacks in the event that the primary transcriber provider fails."
            }
          },
          "required": [
            "provider",
            "server"
          ]
        },
        {
          "type": "object",
          "properties": {
            "provider": {
              "type": "string",
              "description": "This is the transcription provider that will be used.",
              "enum": [
                "deepgram"
              ]
            },
            "model": {
              "description": "This is the Deepgram model that will be used. A list of models can be found here: https://developers.deepgram.com/docs/models-languages-overview"
            },
            "language": {
              "type": "string",
              "description": "This is the language that will be set for the transcription. The list of languages Deepgram supports can be found here: https://developers.deepgram.com/docs/models-languages-overview",
              "enum": [
                "ar",
                "az",
                "ba",
                "be",
                "bg",
                "bn",
                "br",
                "bs",
                "ca",
                "cs"
              ]
            },
            "smartFormat": {
              "type": "boolean",
              "description": "This will be use smart format option provided by Deepgram. It's default disabled because it can sometimes format numbers as times but it's getting better."
            },
            "mipOptOut": {
              "type": "boolean",
              "description": "If set to true, this will add mip_opt_out=true as a query parameter of all API requests. See https://developers.deepgram.com/docs/the-deepgram-model-improvement-partnership-program#want-to-opt-out\n\nThis will only be used if you are using your own Deepgram API key.\n\n@default false"
            },
            "numerals": {
              "type": "boolean",
              "description": "If set to true, this will cause deepgram to convert spoken numbers to literal numerals. For example, \"my phone number is nine-seven-two...\" would become \"my phone number is 972...\"\n\n@default false"
            },
            "profanityFilter": {
              "type": "boolean",
              "description": "If set to true, Deepgram will replace profanity in transcripts with surrounding asterisks, e.g. \"f***\".\n\n@default false"
            },
            "redaction": {
              "type": "array",
              "description": "Enables redaction of sensitive information from transcripts.\n\nOptions include:\n- \"pci\": Redacts credit card numbers, expiration dates, and CVV.\n- \"pii\": Redacts personally identifiable information (names, locations, identifying numbers, etc.).\n- \"phi\": Redacts protected health information (medical c",
              "enum": [
                "pci",
                "pii",
                "phi",
                "numbers"
              ],
              "items": {
                "type": "object"
              }
            },
            "confidenceThreshold": {
              "type": "number",
              "description": "Transcripts below this confidence threshold will be discarded.\n\n@default 0.4"
            },
            "eotThreshold": {
              "type": "number",
              "description": "End-of-turn confidence required to finish a turn. Only used with Flux models.\n\n@default 0.7"
            },
            "eotTimeoutMs": {
              "type": "number",
              "description": "A turn will be finished when this much time has passed after speech, regardless of EOT confidence. Only used with Flux models.\n\n@default 5000"
            },
            "languages": {
              "type": "array",
              "description": "Language hints to bias Flux Multilingual (`flux-general-multi`) toward specific languages.\nProvide BCP-47 language codes (e.g. \"en\", \"es\", \"fr\"). Multiple hints can be given for\nmultilingual or code-switching scenarios. Omit for auto-detection. Only used with `flux-general-multi`.",
              "items": {
                "type": "object"
              }
            },
            "keywords": {
              "type": "array",
              "description": "These keywords are passed to the transcription model to help it pick up use-case specific words. Anything that may not be a common word, like your company name, should be added here.",
              "items": {
                "type": "object"
              }
            },
            "keyterm": {
              "type": "array",
              "description": "Keyterm Prompting allows you improve Keyword Recall Rate (KRR) for important keyterms or phrases up to 90%.",
              "items": {
                "type": "object"
              }
            },
            "endpointing": {
              "type": "number",
              "description": "This is the timeout after which Deepgram will send transcription on user silence. You can read in-depth documentation here: https://developers.deepgram.com/docs/endpointing.\n\nHere are the most important bits:\n- Defaults to 10. This is recommended for most use cases to optimize for latency.\n- 10 can "
            },
            "fallbackPlan": {
              "description": "This is the plan for transcriber provider fallbacks in the event that the primary transcriber provider fails."
            }
          },
          "required": [
            "provider"
          ]
        },
        {
          "type": "object",
          "properties": {
            "provider": {
              "type": "string",
              "description": "This is the transcription provider that will be used.",
              "enum": [
                "11labs"
              ]
            },
            "model": {
              "description": "This is the model that will be used for the transcription."
            },
            "language": {
              "type": "string",
              "description": "This is the language that will be used for the transcription.",
              "enum": [
                "aa",
                "ab",
                "ae",
                "af",
                "ak",
                "am",
                "an",
                "ar",
                "as",
                "av"
              ]
            },
            "silenceThresholdSeconds": {
              "type": "number",
              "description": "This is the number of seconds of silence before VAD commits (0.3-3.0)."
            },
            "confidenceThreshold": {
              "type": "number",
              "description": "This is the VAD sensitivity (0.1-0.9, lower indicates more sensitive)."
            },
            "minSpeechDurationMs": {
              "type": "number",
              "description": "This is the minimum speech duration for VAD (50-2000ms)."
            },
            "minSilenceDurationMs": {
              "type": "number",
              "description": "This is the minimum silence duration for VAD (50-2000ms)."
            },
            "fallbackPlan": {
              "description": "This is the plan for transcriber provider fallbacks in the event that the primary transcriber provider fails."
            }
          },
          "required": [
            "provider"
          ]
        }
      ]
    },
    "model": {
      "description": "These are the options for the assistant's LLM.",
      "oneOf": [
        {
          "type": "object",
          "properties": {
            "messages": {
              "type": "array",
              "description": "This is the starting state for the conversation.",
              "items": {
                "type": "object"
              }
            },
            "tools": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use existing tools, use `toolIds`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "toolIds": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use transient tools, use `tools`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "knowledgeBase": {
              "description": "These are the options for the knowledge base."
            },
            "model": {
              "type": "string",
              "description": "The specific Anthropic/Claude model that will be used.",
              "enum": [
                "claude-3-opus-20240229",
                "claude-3-sonnet-20240229",
                "claude-3-haiku-20240307",
                "claude-3-5-sonnet-20240620",
                "claude-3-5-sonnet-20241022",
                "claude-3-5-haiku-20241022",
                "claude-3-7-sonnet-20250219",
                "claude-opus-4-20250514",
                "claude-opus-4-5-20251101",
                "claude-opus-4-6"
              ]
            },
            "provider": {
              "type": "string",
              "description": "The provider identifier for Anthropic.",
              "enum": [
                "anthropic"
              ]
            },
            "thinking": {
              "description": "Optional configuration for Anthropic's thinking feature.\nOnly applicable for claude-3-7-sonnet-20250219 model.\nIf provided, maxTokens must be greater than thinking.budgetTokens."
            },
            "temperature": {
              "type": "number",
              "description": "This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency."
            },
            "maxTokens": {
              "type": "number",
              "description": "This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250."
            },
            "emotionRecognitionEnabled": {
              "type": "boolean",
              "description": "This determines whether we detect user's emotion while they speak and send it as an additional info to model.\n\nDefault `false` because the model is usually are good at understanding the user's emotion from text.\n\n@default false"
            },
            "numFastTurns": {
              "type": "number",
              "description": "This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.\n\nDefault is 0.\n\n@default 0"
            }
          },
          "required": [
            "model",
            "provider"
          ]
        },
        {
          "type": "object",
          "properties": {
            "messages": {
              "type": "array",
              "description": "This is the starting state for the conversation.",
              "items": {
                "type": "object"
              }
            },
            "tools": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use existing tools, use `toolIds`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "toolIds": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use transient tools, use `tools`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "knowledgeBase": {
              "description": "These are the options for the knowledge base."
            },
            "provider": {
              "type": "string",
              "description": "The provider identifier for Anthropic via AWS Bedrock.",
              "enum": [
                "anthropic-bedrock"
              ]
            },
            "model": {
              "type": "string",
              "description": "The specific Anthropic/Claude model that will be used via Bedrock.",
              "enum": [
                "claude-3-opus-20240229",
                "claude-3-sonnet-20240229",
                "claude-3-haiku-20240307",
                "claude-3-5-sonnet-20240620",
                "claude-3-5-sonnet-20241022",
                "claude-3-5-haiku-20241022",
                "claude-3-7-sonnet-20250219",
                "claude-opus-4-20250514",
                "claude-opus-4-5-20251101",
                "claude-opus-4-6"
              ]
            },
            "thinking": {
              "description": "Optional configuration for Anthropic's thinking feature.\nOnly applicable for claude-3-7-sonnet-20250219 model.\nIf provided, maxTokens must be greater than thinking.budgetTokens."
            },
            "temperature": {
              "type": "number",
              "description": "This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency."
            },
            "maxTokens": {
              "type": "number",
              "description": "This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250."
            },
            "emotionRecognitionEnabled": {
              "type": "boolean",
              "description": "This determines whether we detect user's emotion while they speak and send it as an additional info to model.\n\nDefault `false` because the model is usually are good at understanding the user's emotion from text.\n\n@default false"
            },
            "numFastTurns": {
              "type": "number",
              "description": "This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.\n\nDefault is 0.\n\n@default 0"
            }
          },
          "required": [
            "provider",
            "model"
          ]
        },
        {
          "type": "object",
          "properties": {
            "messages": {
              "type": "array",
              "description": "This is the starting state for the conversation.",
              "items": {
                "type": "object"
              }
            },
            "tools": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use existing tools, use `toolIds`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "toolIds": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use transient tools, use `tools`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "knowledgeBase": {
              "description": "These are the options for the knowledge base."
            },
            "provider": {
              "type": "string",
              "enum": [
                "anyscale"
              ]
            },
            "model": {
              "type": "string",
              "description": "This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b"
            },
            "temperature": {
              "type": "number",
              "description": "This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency."
            },
            "maxTokens": {
              "type": "number",
              "description": "This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250."
            },
            "emotionRecognitionEnabled": {
              "type": "boolean",
              "description": "This determines whether we detect user's emotion while they speak and send it as an additional info to model.\n\nDefault `false` because the model is usually are good at understanding the user's emotion from text.\n\n@default false"
            },
            "numFastTurns": {
              "type": "number",
              "description": "This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.\n\nDefault is 0.\n\n@default 0"
            }
          },
          "required": [
            "provider",
            "model"
          ]
        },
        {
          "type": "object",
          "properties": {
            "messages": {
              "type": "array",
              "description": "This is the starting state for the conversation.",
              "items": {
                "type": "object"
              }
            },
            "tools": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use existing tools, use `toolIds`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "toolIds": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use transient tools, use `tools`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "knowledgeBase": {
              "description": "These are the options for the knowledge base."
            },
            "model": {
              "type": "string",
              "description": "This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b",
              "enum": [
                "llama3.1-8b",
                "llama-3.3-70b"
              ]
            },
            "provider": {
              "type": "string",
              "enum": [
                "cerebras"
              ]
            },
            "temperature": {
              "type": "number",
              "description": "This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency."
            },
            "maxTokens": {
              "type": "number",
              "description": "This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250."
            },
            "emotionRecognitionEnabled": {
              "type": "boolean",
              "description": "This determines whether we detect user's emotion while they speak and send it as an additional info to model.\n\nDefault `false` because the model is usually are good at understanding the user's emotion from text.\n\n@default false"
            },
            "numFastTurns": {
              "type": "number",
              "description": "This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.\n\nDefault is 0.\n\n@default 0"
            }
          },
          "required": [
            "model",
            "provider"
          ]
        },
        {
          "type": "object",
          "properties": {
            "messages": {
              "type": "array",
              "description": "This is the starting state for the conversation.",
              "items": {
                "type": "object"
              }
            },
            "tools": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use existing tools, use `toolIds`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "toolIds": {
              "type": "array",
              "description": "These are the tools that the assistant can use during the call. To use transient tools, use `tools`.\n\nBoth `tools` and `toolIds` can be used together.",
              "items": {
                "type": "object"
              }
            },
            "knowledgeBase": {
              "description": "These are the options for the knowledge base."
            },
            "provider": {
              "type": "string",
              "description": "This is the provider that will be used for the model. Any service, including your own server, that is compatible with the OpenAI API can be used.",
              "enum": [
                "custom-llm"
              ]
            },
            "metadataSendMode": {
              "type": "string",
              "description": "This determines whether metadata is sent in requests to the custom provider.\n\n- `off` will not send any metadata. payload will look like `{ messages }`\n- `variable` will send `assistant.metadata` as a variable on the payload. payload will look like `{ messages, metadata }`\n- `destructured` will send",
              "enum": [
                "off",
                "variable",
                "destructured"
              ]
            },
            "headers": {
              "type": "object",
              "description": "Custom headers to send with requests. These headers can override default OpenAI headers except for Authorization (which should be specified using a custom-llm credential)."
            },
            "url": {
              "type": "string",
              "description": "These is the URL we'll use for the OpenAI client's `baseURL`. Ex. https://openrouter.ai/api/v1"
            },
            "wordLevelConfidenceEnabled": {
              "type": "boolean",
              "description": "This determines whether the transcriber's word level confidence is sent in requests to the custom provider. Default is false.\nThis only works for Deepgram transcribers."
            },
            "timeoutSeconds": {
              "type": "number",
              "description": "This sets the timeout for the connection to the custom provider without needing to stream any tokens back. Default is 20 seconds."
            },
            "model": {
              "type": "string",
              "description": "This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b"
            },
            "temperature": {
              "type": "number",
              "description": "This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency."
            },
            "maxTokens": {
              "type": "number",
              "description": "This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250."
            },
            "emotionRecognitionEnabled": {
              "type": "boolean",
              "description": "This determines whether we detect user's emotion while they speak and send it as an additional info to model.\n\nDefault `false` because the model is usually are good at understanding the user's emotion from text.\n\n@default false"
            },
            "numFastTurns": {
              "type": "number",
              "description": "This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.\n\nDefault is 0.\n\n@default 0"
            }
          },
          "required": [
            "provider",
            "url",
            "model"
          ]
        }
      ]
    },
    "voice": {
      "description": "These are the options for the assistant's voice.",
      "oneOf": [
        {
          "type": "object",
          "properties": {
            "cachingEnabled": {
              "type": "boolean",
              "description": "This is the flag to toggle voice caching for the assistant."
            },
            "provider": {
              "type": "string",
              "description": "This is the voice provider that will be used.",
              "enum": [
                "azure"
              ]
            },
            "voiceId": {
              "description": "This is the provider-specific ID that will be used."
            },
            "chunkPlan": {
              "description": "This is the plan for chunking the model output before it is sent to the voice provider."
            },
            "speed": {
              "type": "number",
              "description": "This is the speed multiplier that will be used."
            },
            "fallbackPlan": {
              "description": "This is the plan for voice provider fallbacks in the event that the primary voice provider fails."
            }
          },
          "required": [
            "provider",
            "voiceId"
          ]
        },
        {
          "type"

# --- truncated at 32 KB (77 KB total) ---
# Full source: https://raw.githubusercontent.com/api-evangelist/vapi-ai/refs/heads/main/json-schema/vapi-assistant-schema.json