Vapi · Schema

EvalCustomModel

AIVoiceAgentsRealtimeCPaaS

Properties

Name	Type	Description
provider	string	This is the provider of the model (`custom-llm`).
url	string	These is the URL we'll use for the OpenAI client's `baseURL`. Ex. https://openrouter.ai/api/v1
headers	object	These are the headers we'll use for the OpenAI client's `headers`.
timeoutSeconds	number	This sets the timeout for the connection to the custom provider without needing to stream any tokens back. Default is 20 seconds.
model	string	This is the name of the model. Ex. gpt-4o
temperature	number	This is the temperature of the model. For LLM-as-a-judge, it's recommended to set it between 0 - 0.3 to avoid hallucinations and ensure the model judges the output correctly based on the instructions.
maxTokens	number	This is the max tokens of the model. If your Judge instructions return `true` or `false` takes only 1 token (as per the OpenAI Tokenizer), and therefore is recommended to set it to a low number to for
messages	array	These are the messages which will instruct the AI Judge on how to evaluate the assistant message. The LLM-Judge must respond with "pass" or "fail" to indicate if the assistant message passes the eval.

View JSON Schema on GitHub

JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "#/components/schemas/EvalCustomModel",
  "title": "EvalCustomModel",
  "type": "object",
  "properties": {
    "provider": {
      "type": "string",
      "description": "This is the provider of the model (`custom-llm`).",
      "enum": [
        "custom-llm"
      ]
    },
    "url": {
      "type": "string",
      "description": "These is the URL we'll use for the OpenAI client's `baseURL`. Ex. https://openrouter.ai/api/v1"
    },
    "headers": {
      "type": "object",
      "description": "These are the headers we'll use for the OpenAI client's `headers`."
    },
    "timeoutSeconds": {
      "type": "number",
      "description": "This sets the timeout for the connection to the custom provider without needing to stream any tokens back. Default is 20 seconds.",
      "minimum": 20,
      "maximum": 600
    },
    "model": {
      "type": "string",
      "description": "This is the name of the model. Ex. gpt-4o",
      "maxLength": 100
    },
    "temperature": {
      "type": "number",
      "description": "This is the temperature of the model. For LLM-as-a-judge, it's recommended to set it between 0 - 0.3 to avoid hallucinations and ensure the model judges the output correctly based on the instructions.",
      "minimum": 0,
      "maximum": 2
    },
    "maxTokens": {
      "type": "number",
      "description": "This is the max tokens of the model.\nIf your Judge instructions return `true` or `false` takes only 1 token (as per the OpenAI Tokenizer), and therefore is recommended to set it to a low number to force the model to return a short response.",
      "minimum": 50,
      "maximum": 10000
    },
    "messages": {
      "description": "These are the messages which will instruct the AI Judge on how to evaluate the assistant message.\nThe LLM-Judge must respond with \"pass\" or \"fail\" to indicate if the assistant message passes the eval.\n\nTo access the messages in the mock conversation, use the LiquidJS variable `{{messages}}`.\nThe assistant message to be evaluated will be passed as the last message in the `messages` array and can be accessed using `{{messages[-1]}}`.\n\nIt is recommended to use the system message to instruct the LLM how to evaluate the assistant message, and then use the first user message to pass the assistant message to be evaluated.",
      "example": "{",
      "type": "array",
      "items": {
        "type": "object"
      }
    }
  },
  "required": [
    "provider",
    "url",
    "model",
    "messages"
  ]
}