Software Setup

Java

Essential Settings

FACT-Finder requires two environment variables:

  • FACTFINDER_RESOURCES denotes which path can be used to find or create application-specific folders to keep application data (e.g. configurations, exports, et.). You can choose your own path, we recommend: /opt/factfinder.
  • FACTFINDER_KEY denotes the path of your licence file.

These variables are ideally handed over to Tomcat via setenv.sh. In our case, this has been created under /usr/share/tomcat8/bin with the following contents:

export JAVA_OPTS="$JAVA_OPTS\
 -DFACTFINDER_RESOURCES=/opt/factfinder\
 -DFACTFINDER_KEY=/opt/factfinder/licence.ffkey"

After restarting Tomcat,  the variables should be visible in the process. Check this with this command: ps aux | grep FACTFINDER.

Recommended Settings

In addition to the FACT-Finder-specific environment variables, we also recommend explicitly setting memory allocations for JAVA. The initial Heapspace  is set with the parameter -Xms, the maximum Heapspace with -Xmx. These values depend on FACT-Finder's database size and its allocated RAM. 

Usually, the Java-Heap should be as large as possible. This allows for the expansion of memory-based caches and generates performance advantages. The Java-Heap does, however, compete with the Native-Heap. FACT-Finder databases are loaded into the Native-Heap, not the Java-Heap. The larger the Java-Heap is, the less room there is for the Native-Heap. You should always leave enough room for the Native-Heap, when allocating space for the Java-Heap.

Example: A FACT-Finder instance is running three channels. The FACT-Finder product databases are 50MB, 70MB and 130MB. The Multi-Lib function has been activated for the third channel and the number of libs was set to 4. Here is how to roughly calculate memory requirements for the Native-Heap:

((50 + 70 + 130 * 4) + (5 + 7 + 13) + 130) * 2 = 1590

The first three numbers are the sizes of the databases. The multi-lib function loads every database multiple times, so the number of databases is multiplied with the number of configured libs.

The next three numbers are the estimated sizes of the channels' Suggest databases. Especially with large, their impact can not be discounted.

The last value represents the required memory for the product data import. Since the import is performed parallel to the search, you need to allocate extra memory. Imports are never performed simultaneously, so allocating memory for a single channel is enough. Base this on the size on the largest channel. All databases are multiplied by two, since they require twice as much RAM as they need disk space.

If you activate asynchronous reloading for yn instance's database (tableLoadAsync=true in fff.properties), the reload requires additional memory. Loading the database needs exactly as much memory as holding the database. If channel 3 requires 130 * 4 * 2 MB, then an asynchronous reload needs the same amount.

Parallel processing is more complicated when loading a database, than it is when doing an import. In theory, multiple databases can be loaded at once. In that case, they would need exactly as much memory as they need to run. In practice, this will only occur in niche cases. Without external input, FACT-Finder databases will always be loaded in sequence, never all at once. An exception would be, if the database reload API is called multiple times and more than one database registers changes made to it. Then those databases would be reloaded simultaniously.

THis is the addition to the abve formula for asynchronous reloads (expecting that databases will not be reloaded simultaniously):

((50 + 70 + 130 * 4) + (5 + 7 + 13) + 130 + (130 * 4)) * 2 = 2630

If you have installed the recommended tcmalloc-library, activate it via setenv.sh. This is done via the second export command in the file.

The setenv.sh could, with set values, look like this:

export JAVA_OPTS="$JAVA_OPTS\
 -DFACTFINDER_RESOURCES=/opt/factfinder\
 -DFACTFINDER_KEY=/opt/factfinder/licence.ffkey\
 -Xms512m\
 -Xmx3g"

export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4

Starting from FACT-Finder 7.3.4-0 it is recommended to set the Xss to 4m

-Xss4m

The default value is: Xss2m


Check whether tcmalloc was activated successfully, by restarting Tomcat and entering this command:

sudo grep malloc /proc/[TOMCAT-PROCESS-ID]/maps

Optional Settings

The prior Java options are essential for operating FACT-Finder. We recommend the following additional settings. These are nonessential. You should also check for company-specific standard sets before applying them.

Omikron uses the file below for standard FACT-Finder operation. It contains the parameters displayed below i addition to the default Tomcat and Java settings:

  • RECOVERY_SCRIPT and XX:OnOutOfMemoryError refer to Omikron's emergency script, so data can be collected for analysis in case of an error. For more information, see:  "Nützliche Skripte". XX:OnOutOfMemoryError is executed by Java/Tomcat, when it registers an OutOfMemory case. RECOVERY_SCRIPT is a FACT-Finder-specific variable, which decides what script to run in case the program detects a critical error and trys to automtically restart.

  • productiveTomcat is used by the e.sh-script to identify the Tomcat process, for which it collects data and which it is supposed to terminate.

  • The instance parameter is set to value which identifies the system/search application. This is especially advisable in scenarios running multiple search applications.

export JAVA_OPTS="$JAVA_OPTS\
 -DFACTFINDER_RESOURCES=/opt/factfinder\
 -DFACTFINDER_KEY=/opt/factfinder/licence.ffkey\
 -Xms512m\
 -Xmx3g\
 -DRECOVERY_SCRIPT=/opt/factfinder/e.sh\
 -XX:OnOutOfMemoryError=/opt/factfinder/e.sh\
 -DproductiveTomcat\
 -Dinstance='hostname'\
 -Djava.awt.headless=true\
 -Djava.security.egd=file:/dev/./urandom\
 -Dcom.sun.management.jmxremote\
 -verbose:gc\
 -XX:+PrintGCDetails\
 -XX:+PrintGCTimeStamps\
 -XX:+PrintGCDateStamps\
 -XX:+UseGCOverheadLimit\
 -XX:GCHeapFreeLimit=25\
 -XX:NewRatio=2\
 -XX:MaxMetaspaceSize=2g\
 -XX:ErrorFile=/opt/factfinder/dumps/hs_err_pid\$\$.log\
 -Xloggc:/opt/factfinder/dumps/gclog.'date +\%F_\%R'.txt\
 -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true\
 -Dorg.apache.jasper.Constants.DEFAULT_TAG_BUFFER_SIZE=2000000\
 -Dderby.stream.error.file=/opt/factfinder/dumps/derby.\$\$.log"

export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4

Tomcat

Essential Settings

Make sure Tomcat encoding is set to UTF-8. You can check the encoding version from the attribute URIEncoding="UTF-8" within the Connector tags in the server.xml. In our case, the file is found at /var/lib/tomcat8/conf. From Tomcat 8 on, this is the standard setting, earlier versions required manually adding the URIEncoding-attribute.

Correctly configured, the Connector configuration should look like this:

<Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               URIEncoding="UTF-8"
               redirectPort="8443" />

Recommended Settings

In Tomcat, you define thread pooling with an executor. The attributes minSpareThreads and maxThreads are set to a value 4 x number of CPU cores. This defines the maximum number of threads in the pool, as well as the minimum number of threads kept alive. The prestartminSpareThreads attribute only makes sure that the minSpareThreads are taken into account when launching the executor. They are linked with the name and executor attributes in the connector.

We recommend these additional connector settings:

  • Changing  protocol to org.apache.coyote.http11.Http11NioProtocol, since this is a non-blocking Java connector.
  • Set acceptCount to a relatively high value. It determines how many queries are accepted when all Tomcat threads are busy.
  • Active GZIP compression via the compressioncompressableMimeType and compressionMinSize attributes. In some cases (e.g. Navigation) the search returns can be very large. This leads to network runtime becoming a large part of the search time, directly working against it. compressableMimeType determines which returns are compressed and compressionMinSize sets a minimum size after which it will be compressed.

<Executor 
    name="tomcatThreadPool" maxThreads="32"
    minSpareThreads="32" prestartminSpareThreads="true"/>

<Connector port="8080" protocol="org.apache.coyote.http11.Http11NioProtocol"
	connectionTimeout="20000"
	URIEncoding="UTF-8"
    redirectPort="8443"
	executor="tomcatThreadPool"    
    acceptCount="1000"
    compression="on"
    compressableMimeType="text/html,text/xml,text/plain,text/javascript,text/css,application/json"
    compressionMinSize="2048" />

Optional Settings

From our experience, modifying AccessLogValve can be useful for adding more information to the access logfiles. Omikron's monitoring scripts require this format. To use them, you need to implement the below configuration.

The spaces in the pattern attribute are tabulators.

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
	prefix="localhost_access_log." suffix=".txt"
	pattern="%h %t  %S  &quot;%r&quot; &quot;%{Referer}i&quot;  %s  %b  %D" 
    resolveHosts="false"
    buffered="true" />

Page Contents