Weaviate · Schema

TokenizerUserDictConfig

A list of pairs of strings that should be replaced with another string during tokenization.

Vector DatabaseAIMachine LearningSemantic SearchOpen SourceGraphQLKubernetes

Properties

Name Type Description
tokenizer string The tokenizer to which the user dictionary should be applied. Currently, only the `kagame` ja and kr tokenizers supports user dictionaries.
replacements array
View JSON Schema on GitHub

JSON Schema

weaviate-tokenizer-user-dict-config-schema.json Raw ↑
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api-evangelist.github.io/weaviate/json-schema/weaviate-tokenizer-user-dict-config-schema.json",
  "title": "TokenizerUserDictConfig",
  "description": "A list of pairs of strings that should be replaced with another string during tokenization.",
  "type": "object",
  "properties": {
    "tokenizer": {
      "type": "string",
      "description": "The tokenizer to which the user dictionary should be applied. Currently, only the `kagame` ja and kr tokenizers supports user dictionaries."
    },
    "replacements": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "source",
          "target"
        ],
        "properties": {
          "source": {
            "type": "string",
            "description": "The string to be replaced."
          },
          "target": {
            "type": "string",
            "description": "The string to replace with."
          }
        }
      }
    }
  }
}