QueryDecompoundingFilter
This filter can be used in a search pipeline.
- the filter load a dictionary with nouns from the configured model repository
- for every search the query is tokenized
- split on whitespace
- split on number character change - example: iphone14 -> iphone 13
- split on lowercase upper case - example SpeedPort -> speed port
- and each word is decompounded
!!com.quasiris.qsf.pipeline.Pipeline
id: search-pipeline
timeout: 4000
filterList:
- !!com.quasiris.qsc.decompound.QueryDecompoundingFilter
active: true
id: search-pipeline.QueryDecompoundingFilter
modelShortId: "com.quasiris.dias|compounds|1.0.11"
Dictionary - nouns.txt
This dictionary contains a list of nouns. The nouns are extracted from domain specific text corpus.
abbau
abbildung
abbinder
abbruch
abbruchgründe
abbruchmeldung
abbrüche
abbucher
Blacklist - nouns-blacklist.txt
The dictionary nouns.txt is a large list and is used as basis for a lot of searches. For a specific search nouns can be removed, by adding them to the blacklist.
bestell
matrix
Whitelist nouns-whitelist.txt
The dictionary nouns.txt is a large list and is used as basis for a lot of searches. For a specific search nouns can be added, by adding them to the whitelist.
quasiris
Exceptions - decompound-exceptions.txt
In language exists a lot of exception from rules. Sometimes the algorithm is not decompounding a word in the expected way. The exceptions override the algorithm.
On the left side is the word, that must be decompounded. On the right side are the parts seperated by |
bestellhotline -> bestell|hotline
routersimkarte -> router|sim|karte
schiffahrt -> schiff|fahrt
donaudampfschifffahrt -> donau|dampf|schiff|fahrt
vertragswechselmatrix -> vertrag|wechsel|matrix