Skip to main content

QueryDecompoundingFilter

This filter can be used in a search pipeline.

  • the filter load a dictionary with nouns from the configured model repository
  • for every search the query is tokenized
    • split on whitespace
    • split on number character change - example: iphone14 -> iphone 13
    • split on lowercase upper case - example SpeedPort -> speed port
  • and each word is decompounded
!!com.quasiris.qsf.pipeline.Pipeline
id: search-pipeline
timeout: 4000
filterList:
- !!com.quasiris.qsc.decompound.QueryDecompoundingFilter
active: true
id: search-pipeline.QueryDecompoundingFilter
modelShortId: "com.quasiris.dias|compounds|1.0.11"

Dictionary - nouns.txt

This dictionary contains a list of nouns. The nouns are extracted from domain specific text corpus.

abbau
abbildung
abbinder
abbruch
abbruchgründe
abbruchmeldung
abbrüche
abbucher

Blacklist - nouns-blacklist.txt

The dictionary nouns.txt is a large list and is used as basis for a lot of searches. For a specific search nouns can be removed, by adding them to the blacklist.

bestell
matrix

Whitelist nouns-whitelist.txt

The dictionary nouns.txt is a large list and is used as basis for a lot of searches. For a specific search nouns can be added, by adding them to the whitelist.

quasiris

Exceptions - decompound-exceptions.txt

In language exists a lot of exception from rules. Sometimes the algorithm is not decompounding a word in the expected way. The exceptions override the algorithm.

On the left side is the word, that must be decompounded. On the right side are the parts seperated by |

bestellhotline -> bestell|hotline
routersimkarte -> router|sim|karte
schiffahrt -> schiff|fahrt
donaudampfschifffahrt -> donau|dampf|schiff|fahrt
vertragswechselmatrix -> vertrag|wechsel|matrix