Mistral AI · Schema

OcrPage

AgentsArtificial IntelligenceBatch ProcessingChatEmbeddingsFine-TuningLarge Language ModelsOCR

Properties

Name Type Description
index integer The zero-based page index in the document.
markdown string The extracted content in markdown format, preserving document structure including headers, tables, and equations.
images array Extracted images from the page, if include_image_base64 was set.
dimensions object
View JSON Schema on GitHub

JSON Schema

mistral-ai-ocrpage-schema.json Raw ↑
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "#/components/schemas/OcrPage",
  "title": "OcrPage",
  "type": "object",
  "properties": {
    "index": {
      "type": "integer",
      "description": "The zero-based page index in the document."
    },
    "markdown": {
      "type": "string",
      "description": "The extracted content in markdown format, preserving document structure including headers, tables, and equations."
    },
    "images": {
      "type": "array",
      "description": "Extracted images from the page, if include_image_base64 was set.",
      "items": {
        "$ref": "#/components/schemas/ExtractedImage"
      }
    },
    "dimensions": {
      "$ref": "#/components/schemas/PageDimensions"
    }
  }
}