Blog | QSC Documentation

Category Select Component

December 22, 2022 · 2 min read

The category select component displays a navigation or tree structure in drop down boxes.

the first dropdown lists all entry from level 0
when the user select an entry in the first dropdown, a second dropdown apears which contains all childs of the selected
the siblings of the first dropdown keep visible

advantages

the tree can have a lot of elements but keeps small
for each level, just one drop down box

API

in the the API the data is returned as a facet
the id starts with the configured id: myCategory
followed by the Suffix Tree
and the number of the level
myCategoryTree0
myCategoryTree1
myCategoryTree2
when the user select the entry in the first dropdown the filter of this entry must be used as a filter in the request
the QSC automatically returns the next level

|--> Kunden
|     |--> Buchhaltung
|     |      |--> Konto
|     |      |--> Service
|     |      |--> Kontakte
|     |      |--> Kündigung
|     |--> Konzerne
|     |--> SMB
|--> Vertrieb
|--> Marketing

{
    "facets": [
        {
            "name": "myCategory",
            "id": "myCategoryTree0",
            "type": "categorySelect",
            "filterName": "myCategoryTree0",
            "count": 3,
            "resultCount": 67,
            "values": [
                {
                    "value": "Kunden",
                    "count": 63,
                    "filter": "myCategoryTree0=123456%7C-%7C2%7C-%7CKunden"
                },
                {
                    "value": "Vertrieb",
                    "count": 2,
                    "filter": "myCategoryTree0=211022%7C-%7C1%7C-%7CVertrieb"
                },
                {
                    "value": "Marketing",
                    "count": 2,
                    "filter": "myCategoryTree0=255044%7C-%7C3%7C-%7CMarketing"
                }
            ]
        },
        {
            "name": "myCategory",
            "id": "myCategoryTree1",
            "type": "categorySelect",
            "filterName": "myCategoryTree1",
            "count": 3,
            "resultCount": 18,
            "values": [
                {
                    "value": "Buchhaltung",
                    "count": 10,
                    "filter": "myCategoryTree1=123456%7C-%7C2%7C-%7CKunden%7C___%7C789012%7C-%7C14%7C-%7CBuchhaltung"
                },
                {
                    "value": "Konzerne",
                    "count": 4,
                    "filter": "myCategoryTree1=123456%7C-%7C2%7C-%7CKunden%7C___%7C132277%7C-%7C8%7C-%7CKonzerne"
                },
                {
                    "value": "SMB",
                    "count": 4,
                    "filter": "myCategoryTree1=123456%7C-%7C2%7C-%7CKunden%7C___%7C732606%7C-%7C12%7C-%7CSMB"
                }
            ]
        },
        {
            "name": "myCategory",
            "id": "myCategoryTree2",
            "type": "categorySelect",
            "filterName": "myCategoryTree2",
            "count": 4,
            "resultCount": 10,
            "values": [
                {
                    "value": "Konto",
                    "count": 4,
                    "filter": "myCategoryTree2=123456%7C-%7C2%7C-%7CKunden%7C___%7C789012%7C-%7C14%7C-%7CBuchhaltung%7C___%7C237126%7C-%7C2%7C-%7CKonto"
                },
                {
                    "value": "Service",
                    "count": 2,
                    "filter": "myCategoryTree2=123456%7C-%7C2%7C-%7CKunden%7C___%7C789012%7C-%7C14%7C-%7CBuchhaltung%7C___%7C244122%7C-%7C0%7C-%7CService"
                },
                {
                    "value": "Kontakte",
                    "count": 2,
                    "filter": "myCategoryTree2=123456%7C-%7C2%7C-%7CKunden%7C___%7C789012%7C-%7C14%7C-%7CBuchhaltung%7C___%7C937264%7C-%7C1%7C-%7CKontakte"
                },
                {
                    "value": "Kündigung",
                    "count": 2,
                    "filter": "myCategoryTree2=123456%7C-%7C2%7C-%7CKunden%7C___%7C789012%7C-%7C14%7C-%7CBuchhaltung%7C___%7C837440%7C-%7C4%7C-%7CK%C3%BCndigung"
                }
            ]
        }
    ]
}

Update

There are 2 possibilities to update the category tree.

1.) update the category tree by an existing search with a navigation facet
2.) update the category tree programmatically

1.) Update by search

The search with the defined searchCode and the searchQuery is called. The searchCode must be configured in the search and must contain a facet with:

id: categories
fieldName: category
searchQuery
searchCode

POST https://qsc.quasiris.de/api/v1/admin/category/merge/facet/ab/navigation

Retrieve

GET https://qsc.quasiris.de/api/v1/category/ab/navigation

defaultSort - name, count, default
strategy
- passthrough: the category tree is loaded directly from the configuration, count and the existance of the category is not computed on runtime
- default: just existing categories are returned, the count is computed on runtime
- facet: the tree is computed on every request using a facet

Java

Python

Stanza

https://stanfordnlp.github.io/stanza/langid.html

mixed-search-results

December 21, 2022 · One min read

cms teaser in content mischen - jeder 4. ist ein cms inhalt
content module rechts
content störer nach erster Zeile

Relevant Facets for search queries

December 21, 2022 · 2 min read

vorselektierte Filterwerte
- per KI vorberechnen
- über Admin Tool pflegbar machen

Precompute filters for each category

for each category facets are precomputed based on statistical data
- the attributes with the most values for a categories are good facets
- the usage of the facets are tracked, facets that are used more often are more important
- TODO: position bias - facets on top positions are clicked more often because of the position
- TODO: facets that are never shown to the user have no chance to get clicks
a classifier is trained,
- input: query (use tracking date)
- output: category
for each query, the classifier is called and based on the classified category the facets are used

Precompute filters based on historical queries - query to query vector search

for the most historical queries, facets are precomputed
it is not possible to precompute facets for all potential queries a user can use
the idea is to use a vector distance, to find the nearest query and use the facets of this query

Automatically compute filter on query time

index the filter for each document - field: filters
for each query, facet on the field filters to determine relevant filters
show the facets for the search result

Advantages

the filters can be computed automatically without manual effort

Disadvantages

2 sequential search queries are required, which doubles the search time
the relevant filters are based of statistical values and might be not optimal for every case

Automatically select filters for specific queries

manually define selected filter for a query
try to find filters based on statistical data

Disadvanatages

similar queries

Slider
Slider with histogram
Color picker
date picker
shoe size picker

Ideas

show on the facet value the image as a preview of the first product when the filter is selected

AWS

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-application.html

Google Cloud Platform (GCP)

Azure

https://learn.microsoft.com/en-us/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal?tabs=azure-portal

databricks

https://www.databricks.com/blog/2015/04/16/the-easiest-way-to-run-spark-jobs.html

POS Tagging

Data Sources

index data
- words
- bigrams
- trigrams
search queries of the users
- make sure that the queries are correct
- query returns hits
- words of the query are in the index data

abstract TL;DR

involve the devops team as early as possible in your team – ideally before writing the first line of code/choosing a vendor, so none of us will be surprised
use git repo available on public internet – (Quasiris internal employees get account immediately, external colleagues need to sign a NDA)
use automatically built docker containers, build triggered by tracking respective git branches/tags
use Quasiris trusted docker registry including notary services to ensure docker image integrity and security standards
remainder of this can be seen as Quasiris version of the 12 fac- tor app (https://12factor.net) – with the appropriate amend- ments/additions/omissions to make sense in Quasiris sphere of influence
any communication with the devops team should be via github tickets to ensure that everything is documented.
kubernetes is the container orchestration of choice – it can be hard to get your head around, but help is available from the devops team
if you plan to use a microservice architecture make sure you are up for it – consider testing/deployment/versioning/orchestration

rules of engagement with devops

how to get devops work done in a controlled way

devopsispartofthedevelopmentteamsameasproductowner/developer/tester
late involvement (after writing of code/vendor selection) will lead to late deployment/delivery/launch and/or higher risk to Quasiris
the role of the devops team member in the project is to ensure delivery to production including quality control/audit trail can happen in a smooth fashion possibly reusing blueprints from other projects – knowledge of other projects is spread within the devops team

code base

how to treat Quasiris intellectual property

all code written for Quasiris – including automation/build pipelines/. . . has to be kept in a available git repository – if it is not pushed to git it is not worth keeping/being backed up
please use devopslab for that purpose
please use git branching to reflect the different stages of develop- ment/features – a good starting point is these or git flow (described here: https://datasift.github.io/gitflow/IntroducingGitFlow.html)
you might decide in your project you need more branches – if sensible that is OK, but please reflect on what problem you want to solve with that
continuous integration/unit tests are a must
tooling to document test coverage/test failure & success is highly recom- mended
you need to decide – which branches trigger a docker image build – which one of you branches (and it can be only one) triggers the docker image build for a release candidate
the deployable artifact is an immutable docker image, which gets promoted from the k8s dev cluster first to staging and then to production by the use of git tags (tags matching a certain pattern will be picked up by CI/CD accordingly)
instead of using git SHA commits as the reference pointer these tags for staging and production should use a pattern like this: ˆ(staging|production)- \d+.\d+..d+$ – semantic versioning as per https://semver.org/
as the docker image is unchangeable and is supposed to run in dev/staging/production any environment specifics need to be external to that image – in the k8s context that means they need to be presented to the container as either environment variables/arguments/config maps/secrets

configuration

let’s make the docker image immutable

configuration items should separated into at least these classes:

general (non environment specific) configuration
environment specific configuration
security/data privacy relevant configuration (could be environment specific or not, eg API access tokens)
all configuration should be fed to the runtime as environment vari- ables/arguments/config maps/secrets
the runtime build should not need any changes to run in different environ- ments (eg single instance on local laptop vs. fully resilient and scaled out production service)
backend services needed for running your service should be configured exter- nally – not only the endpoint itself, but also timeouts/thread pools/queue sizes
special care/consideration/justification should be applied to any stateful services/persistent storage requirements – any reconfiguration of these endpoints need to be done via configuration external to the docker image
should you need persistent storage please consider backup/retention pol- icy/replication/resiliency/scalability requirements – if possible use services offered by cloud provider (OTC – RDBMS: mysql/postgres, key/value: redis, S3 compatible object store) and only if they are not sufficient consider using helm charts and then rolling our own

build

getting from code to deployable artifact

docker images are the one and only deployable artifact type. docker images should be build following the guidelines below: configuration via environment variables/arguments/config maps/secrets to allow redeployment of immutable images in different stages of the development cycle (dev/staging/performance test/production)

docker images shall be minimal size – ideally just one statically linked binary, if that is not possible a minimal Linux distribution shall be used. NO full Linux distribution inside container, NO container as virtual machine replacement
MTR does apply a qualsys vulnerability scan – if the built image does not pass that scan it cannot be deployed
Dockerfile shall be written following best practices: – use layering wisely (do not create data in one layer only to delete it in another eg multiple RUN commands, do && concatenation to keep in one layer) – minimal install of packages (check dependencies carefully) – all resulting docker images must pass magenta trusted registry CVE scan – do not run apps as root unless entirely necessary

getting from code to deployable artifact (cont)

Dockerfile shall be written following best practices: (cont) – if more than one process is running inside the container please consider your justification for doing so – use an init process (eg https://github.com/krallin/tini) to avoid PID 1 issues (eg certain version of java do not allow debugging if running as PID 1) – docker images must be pushed to Quasiris trusted docker registry – docker images must be signed for any deployments via docker notary – docker images shall be automatically build from a tracked git repo branches to support continuous integration and continuous delivery – git commits should map to docker image repository tags directly, either omit the special latest tag in docker registry or map it to your development branch, latest should never map to production branch – any binary build artifacts should be kept in devopslab jfrog artifactory – use volumes/pvs/pvcs for any persistent data, do not assume any data in the container survives for long/any updates – apply appropriate (language specific) dependency/vulnerability check- ing of third party libraries/modules/packages – and fail the build should the check detect an issue

microservices

as one developer said – “an architect’s wet dream, a developer’s nightmare” – so beware and make sure if you want to deploy as microservices you also reap the benefits

each microservice is represented as one docker container/k8s pod as an independantly deployable artifact
each microservice’s docker image is produced from one git repository and the end of CI/CD pipeline for that repository is the docker image being pushed to MTR (producer repository)
testing is based on test per microservice // contract based testing – if end2end testing of all components is a constant requirement please prepare yourself for the question why somebody thought implementing this sevice in microservices was considered a good idea . . .
services consisting of multiple microservices have a separate “umbrella” git repository and store knowledge about which version of which microservice in which configuration is needed to make the service work. CI/CD in this repository manages the whole services and assumes the existance of producer docker images. (consumer repository)
for the avoidance of doubt: no information of what consumes a producer microservice is stored in the producer microservice repository
ongoing work within DevOps team to make producer CI/CD emit an event to a central subscription services with knowledge about producer/consumer relations, followed by CI/CD triggers to subscribed consumer repositories, first working version ready, further development with selected squads

kubernetes

out of scope for Quasiris at the moment

clusters

we have three main clusters // development, staging and production
and inbound/outbound/tools/monitoring (ELK) clusters on the side
they have been build by the devops team using ansible/kubespray and utilize the underlying OTC as cloudprovider (LBaaS and blockstorage for PVCs/PVs)
each cluster is located in its own PVC on OTC

k8s considerations

when bringing new services in this has to be done in conjunction with the devops team member, please note especially

kubernetes manifests describing secrets/configmaps/deployments/daemon sets/stateful sets/services/ingresses are owned by the devops and are shared in the development team – any changes to those manifests need to be discussed with the devops team member
CI/CD pipelines from a developers point of view only change the docker image tag for specific containers, the actual kubernetes manifests for the other components remain the same or any changes are managed by tools developed by the devops team (eg blue/green deployments//canary releases)
as a guide please plan a few days together with your devops team member to get the k8s manifests in an appropriate shape for dev cluster and get CI/CD working on top of that, then move on to staging and probably find more config items needing to be removed from code and put into environments variables/config maps amd then move on to production
please take extra care when creating ingress definitions as our ingress controller will automatically try to obtain a SSL certificate for the specified hostname and that has to match the wildcard DNS entry for the environment (eg *.dev.eliza.telekom-dienste.de) – if it doesn’t traefik will hammer letsencrypt with the wrong hostname and we get banned
also take care when specifying services: only type ClusterIP/headless/ExternalName are allowed, do not specify type LoadBalancer as it will allocate an OTC ELB including EIP – it will not work externally, but utilizes resources and causes confusion
kubernets is quite a learning curve – if unsure, ask. if needed the devops team can supply a temporary kubernets cluster with full admin access to get to know it or give pointers/help for a local laptop install

logging

how do we know what the app is doing

apps send their log stream to STDOUT/STDERR in the container, it is the task of the container runtime to decide where logs go, eg on a development laptop docker stores it locally with the json driver, in production docker sends the log stream to an ELK cluster via syslog driver
log events should be json formatted
each log even (read: log line) should contain the following fields: – timestamp in microsecond resolution – a custom tag (exposed via configuration) to identify the type of log event – for example the event type/application logging the event – a unique request identifier – this identifier should be set by the first service receiving an external request, send on to any internal and external downstream services (internal ones will use it for logging, external backend services can use it communicate with us during troubleshooting) and send back to the original requesting party – clearly indicate success/error in the log messages, including response time latency (microsecond resolution)
all logs are stored in a shared ELK cluster
zipking/jaeger instrumentation is being evaluated right now // expect this section to change

go live considerations

some guidelines//thoughts

never assume a backend system is always available – should it fail make sure our service fails gracefully and meaningfully (eg no stack traces on customer screens)
one backend system failing shall never make other traffic independent of that backend system fail as well
document the traffic model (which external URLs get hit how often and what traffic is generated from that to external systems) of your service in order to ease – performance testing – make sure everybody understands all ingress/egress traffic and relation between them – ideally you are able to derive a traffic model of an existing service from the implemented logging and reporting
if you need to estimate your traffic when deploying a completely new service remember that – end user traffic tends to follow a graph similar to a sinus curve, peak traffic in the early evening, low traffic in the early morning hours – the average request per day is 50% of the requests at peak time

performance testing

at regular intervals/before introducing a completely new service please run a performance test, some guidelines below: – according to your traffic model aim for 300% load of the ex- pected/observed peak load (or higher load if you have a low traffic service – make sure you put some stress on network/IO/memory/CPU) – backend systems need to be simulated by a stub server – this stub server needs to implement the appropriate responses from a content and a timing aspect (same latency distribution as agreed upon with production backend system owner) – run this for 48 hours – tools to run tests are gatling (https://gatling.io/) and jmeter (https://jmeter.apache.org/), test scripts to be supplied by develop- ment team, devops team member can probably find working examples from other teams – document the results of your load test and store them along side the respective git commit – plot latency percentiles/CPU usage/network usage/IO load/memory profile/success & error over time – all of them should be flat (as in parallel to x-axis) during the complete load test

data security and privacy

PSA – privacy and security assessment

Quasiris values data privacy and data security of its customers very highly
all production installations delivering services to Quasiris end customers need to go through a privacy and security assessment

Welcome

May 30, 2019 · One min read

Yangshun Tay

Front End Engineer @ Facebook

Blog features are powered by the blog plugin. Simply add files to the blog directory. It supports tags as well!

Delete the whole directory if you don't want the blog features. As simple as that!

Hello

May 29, 2019 · One min read

Endilie Yacop Sucipto

Maintainer of Docusaurus

Welcome to this blog. This blog is created with Docusaurus 2 alpha.

API​

Update​

1.) Update by search​

Retrieve​

Java​

Open NLP​

Lingua​

Apache Tika​

Python​

Stanza​

Precompute filters for each category​

Precompute filters based on historical queries - query to query vector search​

Automatically compute filter on query time​

Advantages​

Disadvantages​

Automatically select filters for specific queries​

Disadvanatages​

Facet Types​

Ideas​

AWS​

Google Cloud Platform (GCP)​

Azure​

databricks​

POS Tagging​

Data Sources​

abstract TL;DR​

rules of engagement with devops​

how to get devops work done in a controlled way​

code base​

how to treat Quasiris intellectual property​

configuration​

let’s make the docker image immutable​

build​

getting from code to deployable artifact​

getting from code to deployable artifact (cont)​

microservices​

kubernetes​

clusters​

k8s considerations​

logging​

go live considerations​

some guidelines//thoughts​

performance testing​

data security and privacy​

PSA – privacy and security assessment​

API