Dictionary Storage Structure
The system uses a default index named .analysis_ik to store dictionary entries. Users can override this with a custom index while maintaiinng the required structure:
PUT .custom_dict_index/_doc
{
"dictionary_name": "tech_terms",
"dictionary_category": "main_dicts",
"entries": "Artificial Intelligence\nMachine Learning\nDeep Learning"
}
Key fields include:
entries: Contains the dictionary terms separated by newlinesdictionary_name: Identifier for the dictionary setdictionary_category: Specifies dictionary type (main, stopwords, or quantifier)
Configuration Example
To apply customm dictionaries at the field level:
PUT tech_documents
{
"settings": {
"analysis": {
"analyzer": {
"tech_analyzer": {
"tokenizer": "tech_tokenizer"
}
},
"tokenizer": {
"tech_tokenizer": {
"type": "ik_max_word",
"use_custom_dict": true,
"include_defaults": false,
"case_sensitive": false,
"dict_reference": "tech_terms"
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "tech_analyzer"
}
}
}
}
Configuration parameters:
use_custom_dict: Enables/disables custom dictionary usageinclude_defaults: Controls whether to merge with default dictionarycase_sensitive: Determines case sensitivity in tokenizationdict_reference: Points to the dictionary name in the storage index
Dictionary Updates
The system supports dictionary updates through document appends:
POST .analysis_ik/_doc
{
"dictionary_name": "tech_terms",
"dictionary_category": "main_dicts",
"entries": "Neural Networks\nComputer Vision"
}
The analyzer automatically detects new entries by comparing timestamps, with updates processed at one-minute intervals.
Testing the Configuration
Analyze sample text to verify the tokenization:
POST tech_documents/_analyze
{
"field": "description",
"text": "Artificial Intelligence and Neural Networks"
}