Software Setup
Java
Essential Settings
FACT-Finder requires two environment variables:
FACTFINDER_RESOURCES
denotes which path can be used to find or create application-specific folders to keep application data (e.g. configurations, exports, et.). You can choose your own path, we recommend:/opt/factfinder
.FACTFINDER_KEY
denotes the path of your licence file.
These variables are ideally handed over to Tomcat via setenv.sh
. In our case, this has been created under /usr/share/tomcat8/bin
with the following contents:
export JAVA_OPTS="$JAVA_OPTS\
-DFACTFINDER_RESOURCES=/opt/factfinder\
-DFACTFINDER_KEY=/opt/factfinder/licence.ffkey"
After restarting Tomcat, the variables should be visible in the process. Check this with this command: ps aux | grep FACTFINDER
.
Recommended Settings
In addition to the FACT-Finder-specific environment variables, we also recommend explicitly setting memory allocations for JAVA. The initial Heapspace
is set with the parameter -Xms
, the maximum Heapspace
with -Xmx
. These values depend on FACT-Finder's database size and its allocated RAM.
Usually, the Java-Heap should be as large as possible. This allows for the expansion of memory-based caches and generates performance advantages. The Java-Heap does, however, compete with the Native-Heap. FACT-Finder databases are loaded into the Native-Heap, not the Java-Heap. The larger the Java-Heap is, the less room there is for the Native-Heap. You should always leave enough room for the Native-Heap, when allocating space for the Java-Heap.
Example: A FACT-Finder instance is running three channels. The FACT-Finder product databases are 50MB, 70MB and 130MB. The Multi-Lib function has been activated for the third channel and the number of libs was set to 4. Here is how to roughly calculate memory requirements for the Native-Heap:
((50 + 70 + 130 * 4) + (5 + 7 + 13) + 130) * 2 = 1590
The first three numbers are the sizes of the databases. The multi-lib function loads every database multiple times, so the number of databases is multiplied with the number of configured libs.
The next three numbers are the estimated sizes of the channels' Suggest databases. Especially with large, their impact can not be discounted.
The last value represents the required memory for the product data import. Since the import is performed parallel to the search, you need to allocate extra memory. Imports are never performed simultaneously, so allocating memory for a single channel is enough. Base this on the size on the largest channel. All databases are multiplied by two, since they require twice as much RAM as they need disk space.
If you activate asynchronous reloading for yn instance's database (tableLoadAsync=true
in fff.properties
), the reload requires additional memory. Loading the database needs exactly as much memory as holding the database. If channel 3 requires 130 * 4 * 2 MB
, then an asynchronous reload needs the same amount.
Parallel processing is more complicated when loading a database, than it is when doing an import. In theory, multiple databases can be loaded at once. In that case, they would need exactly as much memory as they need to run. In practice, this will only occur in niche cases. Without external input, FACT-Finder databases will always be loaded in sequence, never all at once. An exception would be, if the database reload API is called multiple times and more than one database registers changes made to it. Then those databases would be reloaded simultaniously.
THis is the addition to the abve formula for asynchronous reloads (expecting that databases will not be reloaded simultaniously):
((50 + 70 + 130 * 4) + (5 + 7 + 13) + 130 + (130 * 4)) * 2 = 2630
If you have installed the recommended tcmalloc
-library, activate it via setenv.sh
. This is done via the second export command in the file.
The setenv.sh
could, with set values, look like this:
export JAVA_OPTS="$JAVA_OPTS\
-DFACTFINDER_RESOURCES=/opt/factfinder\
-DFACTFINDER_KEY=/opt/factfinder/licence.ffkey\
-Xms512m\
-Xmx3g"
export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Starting from FACT-Finder 7.3.4-0 it is recommended to set the Xss to 4m
-Xss4m
The default value is: Xss2m
Check whether tcmalloc
was activated successfully, by restarting Tomcat and entering this command:
sudo grep malloc /proc/[TOMCAT-PROCESS-ID]/maps
Optional Settings
The prior Java options are essential for operating FACT-Finder. We recommend the following additional settings. These are nonessential. You should also check for company-specific standard sets before applying them.
Omikron uses the file below for standard FACT-Finder operation. It contains the parameters displayed below i addition to the default Tomcat and Java settings:
RECOVERY_SCRIPT
andXX:OnOutOfMemoryError
refer to Omikron's emergency script, so data can be collected for analysis in case of an error. For more information, see: "Nützliche Skripte".XX:OnOutOfMemoryError
is executed by Java/Tomcat, when it registers an OutOfMemory case.RECOVERY_SCRIPT
is a FACT-Finder-specific variable, which decides what script to run in case the program detects a critical error and trys to automtically restart.productiveTomcat
is used by the e.sh-script to identify the Tomcat process, for which it collects data and which it is supposed to terminate.The
instance
parameter is set to value which identifies the system/search application. This is especially advisable in scenarios running multiple search applications.
export JAVA_OPTS="$JAVA_OPTS\
-DFACTFINDER_RESOURCES=/opt/factfinder\
-DFACTFINDER_KEY=/opt/factfinder/licence.ffkey\
-Xms512m\
-Xmx3g\
-DRECOVERY_SCRIPT=/opt/factfinder/e.sh\
-XX:OnOutOfMemoryError=/opt/factfinder/e.sh\
-DproductiveTomcat\
-Dinstance='hostname'\
-Djava.awt.headless=true\
-Djava.security.egd=file:/dev/./urandom\
-Dcom.sun.management.jmxremote\
-verbose:gc\
-XX:+PrintGCDetails\
-XX:+PrintGCTimeStamps\
-XX:+PrintGCDateStamps\
-XX:+UseGCOverheadLimit\
-XX:GCHeapFreeLimit=25\
-XX:NewRatio=2\
-XX:MaxMetaspaceSize=2g\
-XX:ErrorFile=/opt/factfinder/dumps/hs_err_pid\$\$.log\
-Xloggc:/opt/factfinder/dumps/gclog.'date +\%F_\%R'.txt\
-Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true\
-Dorg.apache.jasper.Constants.DEFAULT_TAG_BUFFER_SIZE=2000000\
-Dderby.stream.error.file=/opt/factfinder/dumps/derby.\$\$.log"
export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Tomcat
Essential Settings
Make sure Tomcat encoding is set to UTF-8. You can check the encoding version from the attribute URIEncoding="UTF-8"
within the Connector t
ags in the server.xml
. In our case, the file is found at /var/lib/tomcat8/conf
. From Tomcat 8 on, this is the standard setting, earlier versions required manually adding the URIEncoding
-attribute.
Correctly configured, the Connector c
onfiguration should look like this:
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
URIEncoding="UTF-8"
redirectPort="8443" />
Recommended Settings
In Tomcat, you define thread pooling with an executor. The attributes minSpareThreads a
nd maxThreads
are set to a value 4 x number of CPU cores. This defines the maximum number of threads in the pool, as well as the minimum number of threads kept alive. The prestartminSpareThreads
attribute only makes sure that the minSpareThreads
are taken into account when launching the executor. They are linked with the name
and executor
attributes in the connector.
We recommend these additional connector settings:
- Changing
protocol
toorg.apache.coyote.http11.Http11NioProtocol
, since this is a non-blocking Java connector. - Set
acceptCount
to a relatively high value. It determines how many queries are accepted when all Tomcat threads are busy. - Active GZIP compression via the
compression
,compressableMimeType
andcompressionMinSize
attributes. In some cases (e.g. Navigation) the search returns can be very large. This leads to network runtime becoming a large part of the search time, directly working against it.compressableMimeType
determines which returns are compressed andcompressionMinSize
sets a minimum size after which it will be compressed.
<Executor
name="tomcatThreadPool" maxThreads="32"
minSpareThreads="32" prestartminSpareThreads="true"/>
<Connector port="8080" protocol="org.apache.coyote.http11.Http11NioProtocol"
connectionTimeout="20000"
URIEncoding="UTF-8"
redirectPort="8443"
executor="tomcatThreadPool"
acceptCount="1000"
compression="on"
compressableMimeType="text/html,text/xml,text/plain,text/javascript,text/css,application/json"
compressionMinSize="2048" />
Optional Settings
From our experience, modifying AccessLogValve
can be useful for adding more information to the access logfiles. Omikron's monitoring scripts require this format. To use them, you need to implement the below configuration.
The spaces in the pattern
attribute are tabulators.
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="localhost_access_log." suffix=".txt"
pattern="%h %t %S "%r" "%{Referer}i" %s %b %D"
resolveHosts="false"
buffered="true" />