Minimum Similarity

When searching for records, the FACT-Finder algorithm compares the search term with the available data records and rates every hit's similarity in percent. The higher the similarity between search term and hit, the higher the similarity value – ideally 100%. The Minimum similarity and Spread parameters determine whether a record is included in the results list. The minimum similarity is a static percentage value and represents an absolute lower limit. If the similarity of a record falls below the minimum similarity stipulated it will not be included in the results list.

Example (prior to application of spread):

FACT-Finder finds five records with the following similarities: 90%, 88%, 80%, 78% and 75%. The minimum similarity is 80%. Three records are included in the results list – those with the similarity values 90%, 88% and 80%.
If you set minimum similarity to 78%, then four hits are returned.

Impact

The Minimum similarity parameter has an indirect impact on the search performance and a direct impact on the quality of results.

Quality of Results

A high minimum similarity means only the more exact hits will be included. A lower minimum similarity allows a greater tolerance for error but also increases the probability of finding irrelevant hits.

Search Performace:

In general, FACT-Finder attempts to find as many data records as possible. The search process is interrupted by two factors:

  1. FACT-Finder has found a sufficient number of records (see the table in the section on Speed vs Accuracy). 
  2. FACT-Finder has found everything it can within the rules established by its parameters.

Search terms that return a large result set tend to be limited by the first factor. If the minimum similarity is set to a low value, the maximum number of search hits is reached more quickly, as the minimum similarity also permits records with lower similarity. If the minimum similarity is set to a high value, then FACT-Finder can only include records with a high similarity in the result set. They tend to be fewer in number. This means that search processes with a low minimum similarity are terminated more quickly than those with a high minimum similarity.

How to change Settings

See: Search Algorithm.

Recommendation

We recommend you set the minimum similarity to between 78% and 80%. If the results return too many unsuitable articles with a low degree of similarity, you can increase the value of this parameter. We do not recommend lowering the minimum similarity value to below 75%.

Page Contents