Baseten · Schema

ChatCompletionRequest

Request body for creating a chat completion.

AIMLInferenceDeploymentMLOpsOpenAI CompatibleAnthropic CompatibleTruss

Properties

Name	Type	Description
messages	array	A list of messages representing the conversation history. Supports roles: `system`, `user`, `assistant`, and `tool`.
model	string	The model slug to use for completion, such as `deepseek-ai/DeepSeek-V4-Pro`. Find available models at [Model APIs](https://app.baseten.co/model-apis/create).
frequency_penalty	number	Penalizes tokens based on how frequently they appear in the text so far. Positive values decrease repetition. Support varies by model.
logit_bias	object	A map of token IDs to bias values (-100 to 100). Use this to increase or decrease the likelihood of specific tokens appearing in the output.
logprobs	boolean	If `true`, returns log probabilities of the output tokens. Log probability support varies by model.
top_logprobs	integer	Number of most likely tokens to return at each position (0-20). Requires `logprobs: true`. Log probability support varies by model.
max_tokens	integer	Maximum number of tokens to generate. If your request input plus `max_tokens` exceeds the model's context length, `max_tokens` is truncated. If your request exceeds the context length by more than 16k
n	integer	Number of completions to generate. Only `1` is supported.
presence_penalty	number	Penalizes tokens based on whether they have appeared in the text so far. Positive values encourage the model to discuss new topics. Support varies by model.
response_format	object	Specifies the output format. Use `{"type": "json_object"}` for JSON mode, or `{"type": "json_schema", "json_schema": {...}}` for structured outputs with a specific schema.
seed	integer	Random seed for deterministic generation. Determinism is not guaranteed across different hardware or model versions.
stop	object	Up to 32 sequences where the API stops generating further tokens. Can be a string or array of strings.
stream	boolean	If `true`, responses are streamed back as server-sent events (SSE) as they are generated.
stream_options	object	Options for streaming responses. Set `include_usage: true` to receive token usage statistics in the final chunk.
temperature	number	Controls randomness in the output. Lower values like 0.2 produce more focused and deterministic responses. Higher values like 1.5 produce more creative and varied output.
top_p	number	Nucleus sampling: only consider tokens with cumulative probability up to this value. Lower values like 0.1 produce more focused output.
tools	array	A list of tools (functions) the model may call. Each tool should have a `type: "function"` and a `function` object with `name`, `description`, and `parameters`.
tool_choice	object	Controls which tool (if any) the model calls. - `none`: Never call a tool. - `auto`: Model decides whether to call a tool. - `required`: Model must call at least one tool. - `{"type": "function", "fun
parallel_tool_calls	boolean	If `true`, the model can call multiple tools in a single response.
user	string	A unique identifier for the end-user, useful for tracking and abuse detection.
best_of	integer	Number of candidate sequences to generate and return the best from. Only a value of 1 is supported.
top_k	integer	Limits token selection to the top K most probable tokens at each step. Lower values like 10 produce more focused output. Set to -1 to disable.
top_p_min	number	Minimum value for dynamic `top_p`. When set, `top_p` dynamically adjusts but does not go below this value.
min_p	number	Minimum probability threshold for token selection. Filters out tokens with probability below `min_p * max_probability`.
repetition_penalty	number	Multiplicative penalty for repeated tokens. Values greater than 1.0 discourage repetition, values less than 1.0 encourage it.
length_penalty	number	Exponential penalty applied to sequence length during beam search. Values greater than 1.0 favor longer sequences.
early_stopping	boolean	If `true`, stops generation when at least `n` complete candidates are found.
bad	object	Words or phrases to avoid in the output. Support varies by model.
bad_token_ids	array	Token IDs to avoid in the output. Support varies by model.
stop_token_ids	array	List of token IDs that cause generation to stop when encountered.
include_stop_str_in_output	boolean	If `true`, includes the matched stop string in the output.
ignore_eos	boolean	If `true`, continues generating past the end-of-sequence token.
min_tokens	integer	Minimum number of tokens to generate before stopping. Useful for ensuring responses are not too short.
skip_special_tokens	boolean	If `true`, removes special tokens from the generated output.
spaces_between_special_tokens	boolean	If `true`, adds spaces between special tokens in the output.
truncate_prompt_tokens	integer	If set, truncates the prompt to this many tokens. Useful for handling inputs that may exceed context limits.
echo	boolean	If `true` and the last message role matches the generation role, prepends that message to the output.
add_generation_prompt	boolean	If `true`, adds the generation prompt from the chat template, such as `<\|assistant\|>`. Set to `false` for completion-style generation.
add_special_tokens	boolean	If `true`, adds special tokens like BOS to the prompt beyond what the chat template adds. For most models, the chat template handles special tokens, so this should be `false`.
documents	array	A list of documents for RAG (retrieval-augmented generation). Each document is a dict with string keys and values that the model can reference.
chat_template	string	A custom Jinja template for formatting the conversation. If not provided, uses the model's default template.
chat_template_args	object	Additional arguments to pass to the chat template renderer.
disaggregated_params	object	Advanced parameters for disaggregated serving. Used internally for distributed inference.

View JSON Schema on GitHub

JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "#/components/schemas/ChatCompletionRequest",
  "title": "ChatCompletionRequest",
  "additionalProperties": false,
  "properties": {
    "messages": {
      "type": "array",
      "items": {
        "$ref": "#/components/schemas/ChatCompletionMessage"
      },
      "description": "A list of messages representing the conversation history. Supports roles: `system`, `user`, `assistant`, and `tool`."
    },
    "model": {
      "title": "Model",
      "type": "string",
      "description": "The model slug to use for completion, such as `deepseek-ai/DeepSeek-V4-Pro`. Find available models at [Model APIs](https://app.baseten.co/model-apis/create)."
    },
    "frequency_penalty": {
      "default": 0,
      "title": "Frequency Penalty",
      "description": "Penalizes tokens based on how frequently they appear in the text so far. Positive values decrease repetition. Support varies by model.",
      "type": "number"
    },
    "logit_bias": {
      "default": null,
      "title": "Logit Bias",
      "description": "A map of token IDs to bias values (-100 to 100). Use this to increase or decrease the likelihood of specific tokens appearing in the output.",
      "additionalProperties": {
        "type": "number"
      },
      "type": "object"
    },
    "logprobs": {
      "default": false,
      "title": "Logprobs",
      "description": "If `true`, returns log probabilities of the output tokens. Log probability support varies by model.",
      "type": "boolean"
    },
    "top_logprobs": {
      "default": 0,
      "title": "Top Logprobs",
      "description": "Number of most likely tokens to return at each position (0-20). Requires `logprobs: true`. Log probability support varies by model.",
      "type": "integer"
    },
    "max_tokens": {
      "default": 4096,
      "maximum": 262144,
      "minimum": 1,
      "title": "Max Tokens",
      "type": "integer",
      "description": "Maximum number of tokens to generate. If your request input plus `max_tokens` exceeds the model's context length, `max_tokens` is truncated. If your request exceeds the context length by more than 16k tokens or if `max_tokens` signals no preference, context reservation is throttled to 49512 tokens. Higher `max_tokens` values slightly deprioritize request scheduling."
    },
    "n": {
      "default": 1,
      "title": "N",
      "description": "Number of completions to generate. Only `1` is supported.",
      "type": "integer"
    },
    "presence_penalty": {
      "default": 0,
      "title": "Presence Penalty",
      "description": "Penalizes tokens based on whether they have appeared in the text so far. Positive values encourage the model to discuss new topics. Support varies by model.",
      "type": "number"
    },
    "response_format": {
      "anyOf": [
        {
          "$ref": "#/components/schemas/ResponseFormatText"
        },
        {
          "$ref": "#/components/schemas/ResponseFormatJson"
        },
        {
          "$ref": "#/components/schemas/ResponseFormatJsonObject"
        },
        {
          "$ref": "#/components/schemas/ResponseFormatGrammar"
        },
        {
          "$ref": "#/components/schemas/ResponseFormatStructuralTag"
        }
      ],
      "default": null,
      "title": "Response Format",
      "description": "Specifies the output format. Use `{\"type\": \"json_object\"}` for JSON mode, or `{\"type\": \"json_schema\", \"json_schema\": {...}}` for structured outputs with a specific schema."
    },
    "seed": {
      "default": null,
      "title": "Seed",
      "description": "Random seed for deterministic generation. Determinism is not guaranteed across different hardware or model versions.",
      "type": "integer"
    },
    "stop": {
      "anyOf": [
        {
          "maxLength": 1000,
          "minLength": 1,
          "type": "string"
        },
        {
          "items": {
            "maxLength": 1000,
            "minLength": 1,
            "type": "string"
          },
          "maxItems": 32,
          "type": "array"
        }
      ],
      "title": "Stop",
      "description": "Up to 32 sequences where the API stops generating further tokens. Can be a string or array of strings."
    },
    "stream": {
      "default": false,
      "title": "Stream",
      "description": "If `true`, responses are streamed back as server-sent events (SSE) as they are generated.",
      "type": "boolean"
    },
    "stream_options": {
      "default": null,
      "description": "Options for streaming responses. Set `include_usage: true` to receive token usage statistics in the final chunk.",
      "$ref": "#/components/schemas/StreamOptions"
    },
    "temperature": {
      "default": null,
      "title": "Temperature",
      "description": "Controls randomness in the output. Lower values like 0.2 produce more focused and deterministic responses. Higher values like 1.5 produce more creative and varied output.",
      "maximum": 4,
      "minimum": 0,
      "type": "number"
    },
    "top_p": {
      "default": 1,
      "title": "Top P",
      "description": "Nucleus sampling: only consider tokens with cumulative probability up to this value. Lower values like 0.1 produce more focused output.",
      "exclusiveMinimum": 0,
      "maximum": 1,
      "type": "number"
    },
    "tools": {
      "default": null,
      "title": "Tools",
      "description": "A list of tools (functions) the model may call. Each tool should have a `type: \"function\"` and a `function` object with `name`, `description`, and `parameters`.",
      "items": {
        "$ref": "#/components/schemas/ChatCompletionToolsParam"
      },
      "type": "array"
    },
    "tool_choice": {
      "anyOf": [
        {
          "enum": [
            "none",
            "required",
            "auto"
          ],
          "type": "string"
        },
        {
          "$ref": "#/components/schemas/ChatCompletionNamedToolChoiceParam"
        }
      ],
      "default": null,
      "title": "Tool Choice",
      "description": "Controls which tool (if any) the model calls.\n\n- `none`: Never call a tool.\n- `auto`: Model decides whether to call a tool.\n- `required`: Model must call at least one tool.\n- `{\"type\": \"function\", \"function\": {\"name\": \"...\"}}`: Call a specific function."
    },
    "parallel_tool_calls": {
      "default": true,
      "title": "Parallel Tool Calls",
      "description": "If `true`, the model can call multiple tools in a single response.",
      "type": "boolean"
    },
    "user": {
      "default": null,
      "title": "User",
      "description": "A unique identifier for the end-user, useful for tracking and abuse detection.",
      "type": "string"
    },
    "best_of": {
      "default": null,
      "title": "Best Of",
      "description": "Number of candidate sequences to generate and return the best from. Only a value of 1 is supported.",
      "maximum": 1,
      "minimum": 1,
      "type": "integer"
    },
    "top_k": {
      "default": 50,
      "title": "Top K",
      "description": "Limits token selection to the top K most probable tokens at each step. Lower values like 10 produce more focused output. Set to -1 to disable.",
      "type": "integer"
    },
    "top_p_min": {
      "default": 0,
      "title": "Top P Min",
      "type": "number",
      "description": "Minimum value for dynamic `top_p`. When set, `top_p` dynamically adjusts but does not go below this value."
    },
    "min_p": {
      "default": 0,
      "title": "Min P",
      "type": "number",
      "description": "Minimum probability threshold for token selection. Filters out tokens with probability below `min_p * max_probability`."
    },
    "repetition_penalty": {
      "default": 1,
      "title": "Repetition Penalty",
      "type": "number",
      "description": "Multiplicative penalty for repeated tokens. Values greater than 1.0 discourage repetition, values less than 1.0 encourage it."
    },
    "length_penalty": {
      "default": 1,
      "title": "Length Penalty",
      "type": "number",
      "description": "Exponential penalty applied to sequence length during beam search. Values greater than 1.0 favor longer sequences."
    },
    "early_stopping": {
      "default": false,
      "title": "Early Stopping",
      "type": "boolean",
      "description": "If `true`, stops generation when at least `n` complete candidates are found."
    },
    "bad": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        }
      ],
      "title": "Bad",
      "description": "Words or phrases to avoid in the output. Support varies by model."
    },
    "bad_token_ids": {
      "title": "Bad Token Ids",
      "description": "Token IDs to avoid in the output. Support varies by model.",
      "items": {
        "type": "integer"
      },
      "type": "array"
    },
    "stop_token_ids": {
      "title": "Stop Token Ids",
      "description": "List of token IDs that cause generation to stop when encountered.",
      "items": {
        "type": "integer"
      },
      "type": "array"
    },
    "include_stop_str_in_output": {
      "default": false,
      "title": "Include Stop Str In Output",
      "type": "boolean",
      "description": "If `true`, includes the matched stop string in the output."
    },
    "ignore_eos": {
      "default": false,
      "title": "Ignore Eos",
      "type": "boolean",
      "description": "If `true`, continues generating past the end-of-sequence token."
    },
    "min_tokens": {
      "default": 0,
      "title": "Min Tokens",
      "type": "integer",
      "description": "Minimum number of tokens to generate before stopping. Useful for ensuring responses are not too short."
    },
    "skip_special_tokens": {
      "default": true,
      "title": "Skip Special Tokens",
      "type": "boolean",
      "description": "If `true`, removes special tokens from the generated output."
    },
    "spaces_between_special_tokens": {
      "default": true,
      "title": "Spaces Between Special Tokens",
      "type": "boolean",
      "description": "If `true`, adds spaces between special tokens in the output."
    },
    "truncate_prompt_tokens": {
      "default": null,
      "title": "Truncate Prompt Tokens",
      "description": "If set, truncates the prompt to this many tokens. Useful for handling inputs that may exceed context limits.",
      "minimum": 1,
      "type": "integer"
    },
    "echo": {
      "default": false,
      "description": "If `true` and the last message role matches the generation role, prepends that message to the output.",
      "title": "Echo",
      "type": "boolean"
    },
    "add_generation_prompt": {
      "default": true,
      "description": "If `true`, adds the generation prompt from the chat template, such as `<|assistant|>`. Set to `false` for completion-style generation.",
      "title": "Add Generation Prompt",
      "type": "boolean"
    },
    "add_special_tokens": {
      "default": false,
      "description": "If `true`, adds special tokens like BOS to the prompt beyond what the chat template adds. For most models, the chat template handles special tokens, so this should be `false`.",
      "title": "Add Special Tokens",
      "type": "boolean"
    },
    "documents": {
      "default": null,
      "description": "A list of documents for RAG (retrieval-augmented generation). Each document is a dict with string keys and values that the model can reference.",
      "title": "Documents",
      "items": {
        "additionalProperties": {
          "type": "string"
        },
        "type": "object"
      },
      "type": "array"
    },
    "chat_template": {
      "default": null,
      "description": "A custom Jinja template for formatting the conversation. If not provided, uses the model's default template.",
      "title": "Chat Template",
      "type": "string"
    },
    "chat_template_args": {
      "default": null,
      "description": "Additional arguments to pass to the chat template renderer.",
      "title": "Chat Template Args",
      "additionalProperties": true,
      "type": "object"
    },
    "disaggregated_params": {
      "default": null,
      "description": "Advanced parameters for disaggregated serving. Used internally for distributed inference.",
      "$ref": "#/components/schemas/DisaggregatedParams"
    }
  },
  "required": [
    "messages",
    "model"
  ],
  "type": "object",
  "description": "Request body for creating a chat completion."
}