Skip to main content

How to use the elastic word delimiter graph filter with synonyms

· 2 min read

1. aproach

  • the lowercase filter can not be applied before the word_delimiter_graph filter, because the word_delimiter_graph filter split on lower upper case transitions

PUT my_index

{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_word_delimiter_filter",
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"my_word_delimiter_filter": {
"type": "word_delimiter_graph",
"generate_word_parts": true,
"catenate_words": false
},
"my_synonym_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonyms.txt"
}
}
}
}
}
  • The problem: The synonym_graph filter can not be used after the word_delimiter_graph filter
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Token filter [my_word_delimiter_filter] cannot be used to parse synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "Token filter [my_word_delimiter_filter] cannot be used to parse synonyms"
},
"status": 400
}

img.png

2. aproach

  • use the synonym filter before the word_delimiter filter
{
"filter": [
"my_synonym_filter",
"my_word_delimiter_filter",
"lowercase"
]
}

Problem:

  • Synonym gen,generation
  • word: Gen
  • the synonym is not applied because Gen is uppercase
  • to have all synonyms in upper and lowercase is not an option

3. aproach - use a multiplexer

PUT my_index

{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_word_delimiter_filter",
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"my_multiplexer": {
"type": "multiplexer",
"filters": [ "delimiter_search_index, lowercase",
"lowercase, Synonym Graph, query_stopwords, stem_override, Snowball, unique"
]
},
"my_word_delimiter_filter": {
"type": "word_delimiter_graph",
"generate_word_parts": true,
"catenate_words": false
},
"my_synonym_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonyms.txt"
}
}
}
}
}