How to use the elastic word delimiter graph filter with synonyms
· 2 min read
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-multiplexer-tokenfilter.html
1. aproach
- the lowercase filter can not be applied before the word_delimiter_graph filter, because the word_delimiter_graph filter split on lower upper case transitions
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_word_delimiter_filter",
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"my_word_delimiter_filter": {
"type": "word_delimiter_graph",
"generate_word_parts": true,
"catenate_words": false
},
"my_synonym_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonyms.txt"
}
}
}
}
}
- The problem: The synonym_graph filter can not be used after the word_delimiter_graph filter
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Token filter [my_word_delimiter_filter] cannot be used to parse synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "Token filter [my_word_delimiter_filter] cannot be used to parse synonyms"
},
"status": 400
}
- In the elastic documentation it is answered why.
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html

2. aproach
- use the synonym filter before the word_delimiter filter
{
"filter": [
"my_synonym_filter",
"my_word_delimiter_filter",
"lowercase"
]
}
Problem:
- Synonym gen,generation
- word: Gen
- the synonym is not applied because Gen is uppercase
- to have all synonyms in upper and lowercase is not an option
3. aproach - use a multiplexer
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_word_delimiter_filter",
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"my_multiplexer": {
"type": "multiplexer",
"filters": [ "delimiter_search_index, lowercase",
"lowercase, Synonym Graph, query_stopwords, stem_override, Snowball, unique"
]
},
"my_word_delimiter_filter": {
"type": "word_delimiter_graph",
"generate_word_parts": true,
"catenate_words": false
},
"my_synonym_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonyms.txt"
}
}
}
}
}