Skip to main content

Spellcheck configuration

The QSC Spellcheck uses the search data as a base for the spell correction. There are some rare cases where it is useful to manually manage the spell correction:

  • manual corrections: words that are misspelled, but the algorithm is not able to correct the word automatically
  • misspelled words in the data
  • the searched word is correct but is not in the data

Manual corrections

In some rare cases the searched word can not be corrected automatically. In these cases a manual spell correction can be used.

An example for such a case is when a written pronunciation differs a lot from the correct spelling of the word:

  • The correct spelling is wholesale
  • The written pronunciation is holseil

Wholesale

Misspelled words in the data

The data quality of the search data is often not very good: A lot of misspellings are already in the data. For these cases the misspelled words can be managed.

An example for such a case:

  • The correct spelling is Terrasse
  • The spelling in the product data is terasse

terasse

Correct words

In some cases the searched words don't belong to the domain. Thus they are not in the search data.

An example is, if you search for names of people of the management in a product store. If the search data contains only information about products, but not about people, this can lead to wrong corrections.

  • The correct spelling is Steve Jobs
  • If not managed, the spelling might get corrected to Steve Flops in order to lead to a specific Flip Flops brand

Steve Jobs

Export API

The QSC provides a REST API to export Spellcheck configurations as CSV and Json.

To export the spellcheck data a tenant and a code must be provided. The Export API is secured by a static token (X-QSC-Token). TODO Link to security https://qsc.quasiris.de/admin/#/security/playground/api-key

Example Export as json

This example shows a redirect export for the playground tenant and the code spellcheck

  • tenant: playground
  • code: spellcheck
curl 
-H "Accept: application/json" \
-H "X-QSC-Token: ************" \
https://qsc.quasiris.de/api/v1/admin/list/export/playground/spellcheck?type=json
[
{
"correctSpelling": "wholesale",
"misspellings": [
"holseil"
],
"weight": "100",
"id": 14028,
"type": "misspell"
},
{
"correctSpelling": "Terrasse",
"misspellings": [
"terasse"
],
"weight": "100",
"id": 14029,
"type": "misspell"
},
{
"correctSpelling": "Steve Jobs",
"misspellings": "",
"weight": "100",
"id": 14030,
"type": "correct"
}
]

Configuration

To configure a spellcheck for a search a custom pipeline must be used and the index must be configured in the search configuration.

  • configure a parallel filter that executes the spellcheck in parallel with the search query
  • the QscSpellcheckEnabledFilter check the conditions if a spellcheck is enabled
  • the SpellCheckElasticParallelFilter check every token against the spellcheck index
  • the QscSpellcheckDecisionFilter restart the pipeline with the corrected query

Pipeline:

!!com.quasiris.qsf.pipeline.Pipeline
id: search-pipeline
timeout: 10000
filterList:
- !!com.quasiris.qsf.pipeline.filter.DebugFilter
active: true
id: search-pipeline.DebugFilter
- !!com.quasiris.qsc.search.service.query.QscSearchQueryFilter
active: true
- !!com.quasiris.qsf.pipeline.filter.ParallelFilter
active: true
id: search.ParallelFilter
pipelines:
- filterList:
- !!com.quasiris.qsf.pipeline.filter.elastic.ElasticFilter
active: true
baseUrl: {{searchIndexBaseUrl}}
id: search-result.elasticFilter
resultSetId: {{searchCode}}
queryTransformer: !!com.quasiris.qsc.search.service.QscSearchQueryTransformer
profile: {{profile}}
searchResultTransformer: !!com.quasiris.qsc.search.service.QscResultTransformer
id: search-results
timeout: 10000
- filterList:
- !!com.quasiris.qsc.search.service.spellcheck.QscSpellcheckEnabledFilter
active: true
id: qsc-spellcheck.QscSpellcheckEnabledFilter
- !!com.quasiris.qsf.pipeline.filter.TokenizerFilter
active: true
id: qsc-spellcheck.TokenizerFilter
- !!com.quasiris.qsf.pipeline.filter.elastic.spellcheck.SpellCheckElasticParallelFilter
active: true
baseUrl: {{spellcheckIndexBaseUrl}}
id: spellcheck
maxTokenLenght: 10
minTokenLenght: 4
minTokenWeight: 1
id: spellcheck
timeout: 1000
- !!com.quasiris.qsc.search.service.spellcheck.QscSpellcheckDecisionFilter
active: true
id: search-pipeline.QscSpellcheckDecisionFilter
resultId: {{searchCode}}
restartPipelineId: "search-pipeline"
- !!com.quasiris.qsf.pipeline.filter.QSFQLResponseRefinementFilter
active: true
id: search-pipeline.QSFQLResponseRefinementFilter
resultId: {{searchCode}}
- !!com.quasiris.qsc.search.service.QscTrackingFilter
active: true
baseUrl: {{trackingIndexBaseUrl}}
customParameter:
app: search
searchCode: {{searchCode}}
tenantCode: {{tenant}}
env: {{env}}
id: search-pipeline.ElasticTrackingFilter
idFieldName: id
resultSetId: {{searchCode}}
rotation: none
trackingId: null

search configuration

{
"spellcheck": {
"indexCode": "spellcheck"
}
}

If no configuration exists, the default $searchIndex-spellcheck is used. It is recommend to configure the indexCode.