Baseten · Schema
ChatCompletionRequest
Request body for creating a chat completion.
AIMLInferenceDeploymentMLOpsOpenAI CompatibleAnthropic CompatibleTruss
Properties
| Name | Type | Description |
|---|---|---|
| messages | array | A list of messages representing the conversation history. Supports roles: `system`, `user`, `assistant`, and `tool`. |
| model | string | The model slug to use for completion, such as `deepseek-ai/DeepSeek-V4-Pro`. Find available models at [Model APIs](https://app.baseten.co/model-apis/create). |
| frequency_penalty | number | Penalizes tokens based on how frequently they appear in the text so far. Positive values decrease repetition. Support varies by model. |
| logit_bias | object | A map of token IDs to bias values (-100 to 100). Use this to increase or decrease the likelihood of specific tokens appearing in the output. |
| logprobs | boolean | If `true`, returns log probabilities of the output tokens. Log probability support varies by model. |
| top_logprobs | integer | Number of most likely tokens to return at each position (0-20). Requires `logprobs: true`. Log probability support varies by model. |
| max_tokens | integer | Maximum number of tokens to generate. If your request input plus `max_tokens` exceeds the model's context length, `max_tokens` is truncated. If your request exceeds the context length by more than 16k |
| n | integer | Number of completions to generate. Only `1` is supported. |
| presence_penalty | number | Penalizes tokens based on whether they have appeared in the text so far. Positive values encourage the model to discuss new topics. Support varies by model. |
| response_format | object | Specifies the output format. Use `{"type": "json_object"}` for JSON mode, or `{"type": "json_schema", "json_schema": {...}}` for structured outputs with a specific schema. |
| seed | integer | Random seed for deterministic generation. Determinism is not guaranteed across different hardware or model versions. |
| stop | object | Up to 32 sequences where the API stops generating further tokens. Can be a string or array of strings. |
| stream | boolean | If `true`, responses are streamed back as server-sent events (SSE) as they are generated. |
| stream_options | object | Options for streaming responses. Set `include_usage: true` to receive token usage statistics in the final chunk. |
| temperature | number | Controls randomness in the output. Lower values like 0.2 produce more focused and deterministic responses. Higher values like 1.5 produce more creative and varied output. |
| top_p | number | Nucleus sampling: only consider tokens with cumulative probability up to this value. Lower values like 0.1 produce more focused output. |
| tools | array | A list of tools (functions) the model may call. Each tool should have a `type: "function"` and a `function` object with `name`, `description`, and `parameters`. |
| tool_choice | object | Controls which tool (if any) the model calls. - `none`: Never call a tool. - `auto`: Model decides whether to call a tool. - `required`: Model must call at least one tool. - `{"type": "function", "fun |
| parallel_tool_calls | boolean | If `true`, the model can call multiple tools in a single response. |
| user | string | A unique identifier for the end-user, useful for tracking and abuse detection. |
| best_of | integer | Number of candidate sequences to generate and return the best from. Only a value of 1 is supported. |
| top_k | integer | Limits token selection to the top K most probable tokens at each step. Lower values like 10 produce more focused output. Set to -1 to disable. |
| top_p_min | number | Minimum value for dynamic `top_p`. When set, `top_p` dynamically adjusts but does not go below this value. |
| min_p | number | Minimum probability threshold for token selection. Filters out tokens with probability below `min_p * max_probability`. |
| repetition_penalty | number | Multiplicative penalty for repeated tokens. Values greater than 1.0 discourage repetition, values less than 1.0 encourage it. |
| length_penalty | number | Exponential penalty applied to sequence length during beam search. Values greater than 1.0 favor longer sequences. |
| early_stopping | boolean | If `true`, stops generation when at least `n` complete candidates are found. |
| bad | object | Words or phrases to avoid in the output. Support varies by model. |
| bad_token_ids | array | Token IDs to avoid in the output. Support varies by model. |
| stop_token_ids | array | List of token IDs that cause generation to stop when encountered. |
| include_stop_str_in_output | boolean | If `true`, includes the matched stop string in the output. |
| ignore_eos | boolean | If `true`, continues generating past the end-of-sequence token. |
| min_tokens | integer | Minimum number of tokens to generate before stopping. Useful for ensuring responses are not too short. |
| skip_special_tokens | boolean | If `true`, removes special tokens from the generated output. |
| spaces_between_special_tokens | boolean | If `true`, adds spaces between special tokens in the output. |
| truncate_prompt_tokens | integer | If set, truncates the prompt to this many tokens. Useful for handling inputs that may exceed context limits. |
| echo | boolean | If `true` and the last message role matches the generation role, prepends that message to the output. |
| add_generation_prompt | boolean | If `true`, adds the generation prompt from the chat template, such as `<|assistant|>`. Set to `false` for completion-style generation. |
| add_special_tokens | boolean | If `true`, adds special tokens like BOS to the prompt beyond what the chat template adds. For most models, the chat template handles special tokens, so this should be `false`. |
| documents | array | A list of documents for RAG (retrieval-augmented generation). Each document is a dict with string keys and values that the model can reference. |
| chat_template | string | A custom Jinja template for formatting the conversation. If not provided, uses the model's default template. |
| chat_template_args | object | Additional arguments to pass to the chat template renderer. |
| disaggregated_params | object | Advanced parameters for disaggregated serving. Used internally for distributed inference. |
JSON Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "#/components/schemas/ChatCompletionRequest",
"title": "ChatCompletionRequest",
"additionalProperties": false,
"properties": {
"messages": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ChatCompletionMessage"
},
"description": "A list of messages representing the conversation history. Supports roles: `system`, `user`, `assistant`, and `tool`."
},
"model": {
"title": "Model",
"type": "string",
"description": "The model slug to use for completion, such as `deepseek-ai/DeepSeek-V4-Pro`. Find available models at [Model APIs](https://app.baseten.co/model-apis/create)."
},
"frequency_penalty": {
"default": 0,
"title": "Frequency Penalty",
"description": "Penalizes tokens based on how frequently they appear in the text so far. Positive values decrease repetition. Support varies by model.",
"type": "number"
},
"logit_bias": {
"default": null,
"title": "Logit Bias",
"description": "A map of token IDs to bias values (-100 to 100). Use this to increase or decrease the likelihood of specific tokens appearing in the output.",
"additionalProperties": {
"type": "number"
},
"type": "object"
},
"logprobs": {
"default": false,
"title": "Logprobs",
"description": "If `true`, returns log probabilities of the output tokens. Log probability support varies by model.",
"type": "boolean"
},
"top_logprobs": {
"default": 0,
"title": "Top Logprobs",
"description": "Number of most likely tokens to return at each position (0-20). Requires `logprobs: true`. Log probability support varies by model.",
"type": "integer"
},
"max_tokens": {
"default": 4096,
"maximum": 262144,
"minimum": 1,
"title": "Max Tokens",
"type": "integer",
"description": "Maximum number of tokens to generate. If your request input plus `max_tokens` exceeds the model's context length, `max_tokens` is truncated. If your request exceeds the context length by more than 16k tokens or if `max_tokens` signals no preference, context reservation is throttled to 49512 tokens. Higher `max_tokens` values slightly deprioritize request scheduling."
},
"n": {
"default": 1,
"title": "N",
"description": "Number of completions to generate. Only `1` is supported.",
"type": "integer"
},
"presence_penalty": {
"default": 0,
"title": "Presence Penalty",
"description": "Penalizes tokens based on whether they have appeared in the text so far. Positive values encourage the model to discuss new topics. Support varies by model.",
"type": "number"
},
"response_format": {
"anyOf": [
{
"$ref": "#/components/schemas/ResponseFormatText"
},
{
"$ref": "#/components/schemas/ResponseFormatJson"
},
{
"$ref": "#/components/schemas/ResponseFormatJsonObject"
},
{
"$ref": "#/components/schemas/ResponseFormatGrammar"
},
{
"$ref": "#/components/schemas/ResponseFormatStructuralTag"
}
],
"default": null,
"title": "Response Format",
"description": "Specifies the output format. Use `{\"type\": \"json_object\"}` for JSON mode, or `{\"type\": \"json_schema\", \"json_schema\": {...}}` for structured outputs with a specific schema."
},
"seed": {
"default": null,
"title": "Seed",
"description": "Random seed for deterministic generation. Determinism is not guaranteed across different hardware or model versions.",
"type": "integer"
},
"stop": {
"anyOf": [
{
"maxLength": 1000,
"minLength": 1,
"type": "string"
},
{
"items": {
"maxLength": 1000,
"minLength": 1,
"type": "string"
},
"maxItems": 32,
"type": "array"
}
],
"title": "Stop",
"description": "Up to 32 sequences where the API stops generating further tokens. Can be a string or array of strings."
},
"stream": {
"default": false,
"title": "Stream",
"description": "If `true`, responses are streamed back as server-sent events (SSE) as they are generated.",
"type": "boolean"
},
"stream_options": {
"default": null,
"description": "Options for streaming responses. Set `include_usage: true` to receive token usage statistics in the final chunk.",
"$ref": "#/components/schemas/StreamOptions"
},
"temperature": {
"default": null,
"title": "Temperature",
"description": "Controls randomness in the output. Lower values like 0.2 produce more focused and deterministic responses. Higher values like 1.5 produce more creative and varied output.",
"maximum": 4,
"minimum": 0,
"type": "number"
},
"top_p": {
"default": 1,
"title": "Top P",
"description": "Nucleus sampling: only consider tokens with cumulative probability up to this value. Lower values like 0.1 produce more focused output.",
"exclusiveMinimum": 0,
"maximum": 1,
"type": "number"
},
"tools": {
"default": null,
"title": "Tools",
"description": "A list of tools (functions) the model may call. Each tool should have a `type: \"function\"` and a `function` object with `name`, `description`, and `parameters`.",
"items": {
"$ref": "#/components/schemas/ChatCompletionToolsParam"
},
"type": "array"
},
"tool_choice": {
"anyOf": [
{
"enum": [
"none",
"required",
"auto"
],
"type": "string"
},
{
"$ref": "#/components/schemas/ChatCompletionNamedToolChoiceParam"
}
],
"default": null,
"title": "Tool Choice",
"description": "Controls which tool (if any) the model calls.\n\n- `none`: Never call a tool.\n- `auto`: Model decides whether to call a tool.\n- `required`: Model must call at least one tool.\n- `{\"type\": \"function\", \"function\": {\"name\": \"...\"}}`: Call a specific function."
},
"parallel_tool_calls": {
"default": true,
"title": "Parallel Tool Calls",
"description": "If `true`, the model can call multiple tools in a single response.",
"type": "boolean"
},
"user": {
"default": null,
"title": "User",
"description": "A unique identifier for the end-user, useful for tracking and abuse detection.",
"type": "string"
},
"best_of": {
"default": null,
"title": "Best Of",
"description": "Number of candidate sequences to generate and return the best from. Only a value of 1 is supported.",
"maximum": 1,
"minimum": 1,
"type": "integer"
},
"top_k": {
"default": 50,
"title": "Top K",
"description": "Limits token selection to the top K most probable tokens at each step. Lower values like 10 produce more focused output. Set to -1 to disable.",
"type": "integer"
},
"top_p_min": {
"default": 0,
"title": "Top P Min",
"type": "number",
"description": "Minimum value for dynamic `top_p`. When set, `top_p` dynamically adjusts but does not go below this value."
},
"min_p": {
"default": 0,
"title": "Min P",
"type": "number",
"description": "Minimum probability threshold for token selection. Filters out tokens with probability below `min_p * max_probability`."
},
"repetition_penalty": {
"default": 1,
"title": "Repetition Penalty",
"type": "number",
"description": "Multiplicative penalty for repeated tokens. Values greater than 1.0 discourage repetition, values less than 1.0 encourage it."
},
"length_penalty": {
"default": 1,
"title": "Length Penalty",
"type": "number",
"description": "Exponential penalty applied to sequence length during beam search. Values greater than 1.0 favor longer sequences."
},
"early_stopping": {
"default": false,
"title": "Early Stopping",
"type": "boolean",
"description": "If `true`, stops generation when at least `n` complete candidates are found."
},
"bad": {
"anyOf": [
{
"type": "string"
},
{
"items": {
"type": "string"
},
"type": "array"
}
],
"title": "Bad",
"description": "Words or phrases to avoid in the output. Support varies by model."
},
"bad_token_ids": {
"title": "Bad Token Ids",
"description": "Token IDs to avoid in the output. Support varies by model.",
"items": {
"type": "integer"
},
"type": "array"
},
"stop_token_ids": {
"title": "Stop Token Ids",
"description": "List of token IDs that cause generation to stop when encountered.",
"items": {
"type": "integer"
},
"type": "array"
},
"include_stop_str_in_output": {
"default": false,
"title": "Include Stop Str In Output",
"type": "boolean",
"description": "If `true`, includes the matched stop string in the output."
},
"ignore_eos": {
"default": false,
"title": "Ignore Eos",
"type": "boolean",
"description": "If `true`, continues generating past the end-of-sequence token."
},
"min_tokens": {
"default": 0,
"title": "Min Tokens",
"type": "integer",
"description": "Minimum number of tokens to generate before stopping. Useful for ensuring responses are not too short."
},
"skip_special_tokens": {
"default": true,
"title": "Skip Special Tokens",
"type": "boolean",
"description": "If `true`, removes special tokens from the generated output."
},
"spaces_between_special_tokens": {
"default": true,
"title": "Spaces Between Special Tokens",
"type": "boolean",
"description": "If `true`, adds spaces between special tokens in the output."
},
"truncate_prompt_tokens": {
"default": null,
"title": "Truncate Prompt Tokens",
"description": "If set, truncates the prompt to this many tokens. Useful for handling inputs that may exceed context limits.",
"minimum": 1,
"type": "integer"
},
"echo": {
"default": false,
"description": "If `true` and the last message role matches the generation role, prepends that message to the output.",
"title": "Echo",
"type": "boolean"
},
"add_generation_prompt": {
"default": true,
"description": "If `true`, adds the generation prompt from the chat template, such as `<|assistant|>`. Set to `false` for completion-style generation.",
"title": "Add Generation Prompt",
"type": "boolean"
},
"add_special_tokens": {
"default": false,
"description": "If `true`, adds special tokens like BOS to the prompt beyond what the chat template adds. For most models, the chat template handles special tokens, so this should be `false`.",
"title": "Add Special Tokens",
"type": "boolean"
},
"documents": {
"default": null,
"description": "A list of documents for RAG (retrieval-augmented generation). Each document is a dict with string keys and values that the model can reference.",
"title": "Documents",
"items": {
"additionalProperties": {
"type": "string"
},
"type": "object"
},
"type": "array"
},
"chat_template": {
"default": null,
"description": "A custom Jinja template for formatting the conversation. If not provided, uses the model's default template.",
"title": "Chat Template",
"type": "string"
},
"chat_template_args": {
"default": null,
"description": "Additional arguments to pass to the chat template renderer.",
"title": "Chat Template Args",
"additionalProperties": true,
"type": "object"
},
"disaggregated_params": {
"default": null,
"description": "Advanced parameters for disaggregated serving. Used internally for distributed inference.",
"$ref": "#/components/schemas/DisaggregatedParams"
}
},
"required": [
"messages",
"model"
],
"type": "object",
"description": "Request body for creating a chat completion."
}