Skip to main content

Facet generation

Introduction

A comfortable way of searching through data is by using facets. Using a faceted search makes it possible to filter all kinds of properties. To supply meaningful facets to users by hand is a lot of work, especially in scenarios with tens of thousands of searchable items.

Finding meaningful facets

With facet generation it is possible to find the most meaningful attributes within each item category, based on statistical analysis of your feeding data. The process scores each attribute used in your data, by analyzing features such as the average position of the attribute, or the coverage of the attribute amongst all items in the category (see Configuring features JSON for all available features).

Usage

Requirements

  • You need to have a QSC feeding API prepared, to which data was already fed
    • Your fed data needs to have used the category and attribute format described in the Feeding API Integration documentation
    • We can create a custom converter for your data, if it is not in the required format
  • You need to have a QSC Admin config prepared, into which the generated facets will be written
    • To represent different sets of facets for different states of your feeding data, you can use different resulting configs
  • Your feeding items need to have an id element. This element can be:
    • A unique string
    • Or a list of unique strings, from which the first one will be used as identifier
    • Or any other element.
      • In this case the process identifies the feeding items by an internal counter and thus includes duplicate items
  • Your feeding items need to have a category element. This element can be:
    • A unique string
    • A list of unique strings, from which every string is used. The feeding item is considered on scoring attributes of all categories it is part of.
    • A category traverse element with the structure described in QSC Admin Feeding API Doc
      • In this case only leaf categories (with the highest level) of the traverse are considered

Configuration

Standard Run

When using default settings, you have to choose the feeding API for which you want to generate facets for, and the results config in which the resulting facets should be stored. Notice that you might have to change your item id and categories id element identifier if you aren't using "id" and "categories" as identifiers in your feeding items

TODO: how to select feeding API and results config in QSC UI

Custom Run

To configure the facet generation yourself, you have to create a new config in QSC Admin: How to use configs in QSC Admin.

The config is a JSON which must contain at least

  • "config-api-results-url": The url of the config in which the generated facets should be stored
  • "config-api-results-token": A token to access the config in which the generated facets should be stored
  • "config-api-results-code": The code of the config in which the generated facets should be stored
    • Note that this code also overwrites the name of the config on uploading on the results, if code and name were not equal before
  • "config-api-results-tenant-code": Your tenant code
  • "feeding-api-url": The feeding API for which items you want to generate facets for
  • "feeding-api-token": A token to access the items of your feeding API

e.g.

{
"config-api-results-url": "https://qsc.quasiris.de/api/v1/admin/config/my-tenant/generated-facets",
"config-api-results-token": "my-generated-config-token-1234567890abcdef",
"config-api-results-code": "generated-facets",
"config-api-results-tenant-code": "my-tenant",
"feeding-api-url": "https://qsc.quasiris.de/api/v1/feeding/my-tenant/my-feeding/_data-full",
"feeding-api-token": "my-generated-feeding-api-token-1234567890abcdef"

[...]
}

For all other available configuration options see Configuration Options

TODO how to select custom config in QSC UI

Generate Facets

TODO Where to start the process

Get Result

You can find the results of your facet generation run at your specified results config, e.g.: https://qsc.quasiris.de/admin/#/config/my-tenant/generated facets

An example of the resulting JSON:

{
"category-identifier-12345": {
"scores": [
{
"attribute": "id-123-456-789",
"score": 0.42529928693018315
},
{
"attribute": "weight",
"score": 0.4216055372330198
},
{
"attribute": "color",
"score": 0.4122466859450757
},

[...]

]
},

...

}

Configuration Options

The following config JSON options are available:

Required Settings

Config JSON element nameDefault ValueDescription
config-api-results-urlNoneQSC Admin Config API URL to which the results are uploaded
config-api-results-tokenNoneX-QSC-Token as authentication for uploading results as a config to QSC Admin API
config-api-results-codeNoneQSC Admin config code of your results config, used for writing the resulting generated configs into a QSC Admin config
config-api-results-tenant-codeNoneQSC Admin tenant code, used to identify you for the usage of QSC APIs available to you
feeding-api-urlNoneQSC Admin Feeding API URL, used to gather your feeding data for scoring the available attributes as facets
feeding-api-tokenNoneX-QSC-Token as authentication for reading data from feeding API

TODO: make tenant code fixed by UI, so customers don't have to enter tenant codes, and cannot enter a wrong code by accident

Optional Settings

Config JSON element nameDefault ValueDescription
config-api-settings-urlNoneConfig API URL from which a QSC Admin config can be loaded to receive the settings for your run
config-api-settings-tokenNoneX-QSC-Token as authentication for reading the settings from QSC Admin API
feeding-api-page-size1000Page size of the feeding API requests
feeding-api-sleep-time1.0Time between feeding API requests in seconds, used to reduce server load
feeding-api-product-id-elementidJSON element of the product that identifies the (list of) product id(s)
feeding-api-category-id-elementcategoriesJSON element of the product that identifies the (list of) category id(s). Categories can also be in the QSC format described on QSC Admin Docs
scoring_ignore_attribute_tag"ignore"attributes that contain a 'tags' list which contains this tag are ignored for scoring
scoring-min-products-in-category10Minimum amount of products a category needs to be considered in scoring
scoring-top-n-attributes10Amount of top attributes per category in top output file
scoring-features-jsonCombination of basic features that were useful in customer scenariosjson containing feature settings

Configuring features JSON

If you want to use your own set of features to get the most fitting attributes, you have to define them within a JSON string. Each feature can be used once and all features can be individually combined.

Feature JSON structure

"scoring-features-json":
{
[feature-name]:
{
"linear-interval": "-11",
"sigmoid-curvature": 0.35,
"weight": 0.4
},

[...]

}

Available features

You can choose between multiple features or use a combination of them in order to fine tune your facet selection. The default setting uses all available features that are solely based on attributes, with settings that were useful in customer scenarios.

Feature NameDescription
coverageIn how many percent of the categories products is the attribute used
comparisonDifference in coverage of an attribute between all categories and the current category. A negative score means, the attribute is used less often in this category than in general. A positive score means the attribute is used more often in this category than in general.
positionThe distance between maximum attribute position and the average position of the current attribute inside the same category. Higher scores indicate that an attribute is at a leading position.
varianceThe amount of variance in the value distribution for each attribute. If you have an evenly distributed usage of possible attribute values, the variance is low. If some attribute values are used very often and some very seldom, the variance is high. If you prefer lower variance, weight this feature negatively!

Required values for each feature

Each feature has to be configures on an interval, a sigmoid curvature and a weight that is used to weight each feature into the overall score for each attribute.

ValueTypeDescription
linear-intervaleither "01" or "-11"Determines to which interval scores of this feature are normalized to. Either interval [0, 1] or [-1, 1] is used.
sigmoid-curvaturefloat value in [-1, 1]Determines a curvature used for scaling the normalized scores of this feature in a nonlinear way via a sigmoid function. See this external interactive sigmoid graph for a demo of how scores are scaled by the curvature value
weightfloat valueDetermines the multiplier with which the normalized and scaled scores of this feature are weighted. When using weight values from [0, 1], you will only get scores in [-1, 1] which you can treat similar to confidence scores.
Pipeline Overview

To better understand the sequence of feature calculation, normalization, scaling and weighting for configuring the required values, you can take a look at the process diagram:

Pipeline of the qsc-facet-scoring process