Spellcheck configuration
The QSC Spellcheck uses the search data as a base for the spell correction. There are some rare cases where it is useful to manually manage the spell correction:
- manual corrections: words that are misspelled, but the algorithm is not able to correct the word automatically
- misspelled words in the data
- the searched word is correct but is not in the data
Manual corrections
In some rare cases the searched word can not be corrected automatically. In these cases a manual spell correction can be used.
An example for such a case is when a written pronunciation differs a lot from the correct spelling of the word:
- The correct spelling is
wholesale
- The written pronunciation is
holseil
Misspelled words in the data
The data quality of the search data is often not very good: A lot of misspellings are already in the data. For these cases the misspelled words can be managed.
An example for such a case:
- The correct spelling is
Terrasse
- The spelling in the product data is
terasse
Correct words
In some cases the searched words don't belong to the domain. Thus they are not in the search data.
An example is, if you search for names of people of the management in a product store. If the search data contains only information about products, but not about people, this can lead to wrong corrections.
- The correct spelling is
Steve Jobs
- If not managed, the spelling might get corrected to
Steve Flops
in order to lead to a specific Flip Flops brand
Export API
The QSC provides a REST API to export Spellcheck configurations as CSV and Json.
To export the spellcheck data a tenant and a code must be provided. The Export API is secured by a static token (X-QSC-Token). TODO Link to security https://qsc.quasiris.de/admin/#/security/playground/api-key
- url: https://qsc.quasiris.de/api/v1/admin/list/export/{tenant}/{code}?type=json
- header: X-QSC-Token: **********
Example Export as json
This example shows a redirect export for the playground
tenant and the code spellcheck
- tenant: playground
- code: spellcheck
curl
-H "Accept: application/json" \
-H "X-QSC-Token: ************" \
https://qsc.quasiris.de/api/v1/admin/list/export/playground/spellcheck?type=json
[
{
"correctSpelling": "wholesale",
"misspellings": [
"holseil"
],
"weight": "100",
"id": 14028,
"type": "misspell"
},
{
"correctSpelling": "Terrasse",
"misspellings": [
"terasse"
],
"weight": "100",
"id": 14029,
"type": "misspell"
},
{
"correctSpelling": "Steve Jobs",
"misspellings": "",
"weight": "100",
"id": 14030,
"type": "correct"
}
]
Configuration
To configure a spellcheck for a search a custom pipeline must be used and the index must be configured in the search configuration.
- configure a parallel filter that executes the spellcheck in parallel with the search query
- the
QscSpellcheckEnabledFilter
check the conditions if a spellcheck is enabled - the
SpellCheckElasticParallelFilter
check every token against the spellcheck index - the
QscSpellcheckDecisionFilter
restart the pipeline with the corrected query
Pipeline:
!!com.quasiris.qsf.pipeline.Pipeline
id: search-pipeline
timeout: 10000
filterList:
- !!com.quasiris.qsf.pipeline.filter.DebugFilter
active: true
id: search-pipeline.DebugFilter
- !!com.quasiris.qsc.search.service.query.QscSearchQueryFilter
active: true
- !!com.quasiris.qsf.pipeline.filter.ParallelFilter
active: true
id: search.ParallelFilter
pipelines:
- filterList:
- !!com.quasiris.qsf.pipeline.filter.elastic.ElasticFilter
active: true
baseUrl: {{searchIndexBaseUrl}}
id: search-result.elasticFilter
resultSetId: {{searchCode}}
queryTransformer: !!com.quasiris.qsc.search.service.QscSearchQueryTransformer
profile: {{profile}}
searchResultTransformer: !!com.quasiris.qsc.search.service.QscResultTransformer
id: search-results
timeout: 10000
- filterList:
- !!com.quasiris.qsc.search.service.spellcheck.QscSpellcheckEnabledFilter
active: true
id: qsc-spellcheck.QscSpellcheckEnabledFilter
- !!com.quasiris.qsf.pipeline.filter.TokenizerFilter
active: true
id: qsc-spellcheck.TokenizerFilter
- !!com.quasiris.qsf.pipeline.filter.elastic.spellcheck.SpellCheckElasticParallelFilter
active: true
baseUrl: {{spellcheckIndexBaseUrl}}
id: spellcheck
maxTokenLenght: 10
minTokenLenght: 4
minTokenWeight: 1
id: spellcheck
timeout: 1000
- !!com.quasiris.qsc.search.service.spellcheck.QscSpellcheckDecisionFilter
active: true
id: search-pipeline.QscSpellcheckDecisionFilter
resultId: {{searchCode}}
restartPipelineId: "search-pipeline"
- !!com.quasiris.qsf.pipeline.filter.QSFQLResponseRefinementFilter
active: true
id: search-pipeline.QSFQLResponseRefinementFilter
resultId: {{searchCode}}
- !!com.quasiris.qsc.search.service.QscTrackingFilter
active: true
baseUrl: {{trackingIndexBaseUrl}}
customParameter:
app: search
searchCode: {{searchCode}}
tenantCode: {{tenant}}
env: {{env}}
id: search-pipeline.ElasticTrackingFilter
idFieldName: id
resultSetId: {{searchCode}}
rotation: none
trackingId: null
search configuration
{
"spellcheck": {
"indexCode": "spellcheck"
}
}
If no configuration exists, the default $searchIndex-spellcheck is used.
It is recommend to configure the indexCode
.