Configure Automatic SAP HANA Data Collection with HANASitter

How can we automatize conditioned capturing of specific SAP HANA information (e.g. triggering of a runtime dump when a significant amount of threads is blocked)?

SOLUTION

For analyzing some problems SAP HANA side such as a non-accessible database (SAP Doc 1999020), bad performance (SAP Doc 2000000), lock contention (SAP Doc 1999998) or high CPU consumption (SAP Doc 2100040) more efficiently and easily it is sometimes required to collect information at the time when the problem exists. Sometimes we forget this information or the situation already resolved when traces or dumps gets triggered.

SAP HANASitter is used for configuring reaction methods such as dumps or collection of performance histories when specific conditions like high-loads are met.

Features of SAP HANASitter:

SAP HANASitter is a generic tool which replaces all the individual tools like thrloop
You can implement SAP HANASitter via Python script.
This script is an expert tool which is designed by SAP. The users are allowed to use it but can't help SAP responsible if any problem originates by the use of this tool.

In order to install SAP HANASitter please follow the steps below:

First please download the attached script hanasitter.py
And copy it to a directory on your SAP HANA database server

Once the SAP HANASitter gets installed, you can start it.

The below command provides you with an overview of SAP HANASitter working and configuration options the available.

python hanasitter.py --help

Note: If SAP HANASitter is called without any options, it will do nothing therefore the user always have to provide a set of options based on their requirements.

The following command line options exist to adjust the behaviour:

Option	Default	Unit	Details
-ar	-1	s	check interval, if negative it exits
-cpu	0,0,100	-,s,%	Comma-separated list of three values:
			number of checks
			check interval
			max average CPU
-ct	IS_ACTIVE,TRUE,30		Definition of critical thread situation consisting of three comma-separated values:
			Column of M_SERVICE_THREADS
			Column value
			Maximum number of accepted threads showing this value
			Optional: Maximum accepted duration of these threads (s)
			If blanks are part of one value (e.g. in case of checking for value 'Semaphore Wait'), you have to replace the blanks with a minus (e.g. 'Semaphore-Wait').
			Multiple conditions can be concatenated, they are then evaluated with OR, e.g. "IS_ACTIVE,TRUE,30,THREAD_STATE,Semaphore-Wait,10" means that actions are triggered when there are at least 30 active threads or at least 10 threads with state "Semaphore Wait".
-dp	60	s	Kernel profiler trace duration, i.e. length of traced time frame
-ic	60	s	Call stack interval, i.e. time between two consecutive call stack collections
-ig	60	s	Indexserver gstack interval, i.e. time between two consecutive index server gstack collections
-ip	60	s	Kernel profiler trace interval, i.e. time between two consecutive kernel profiler traces
-ir	60	s	Runtime dump interval, i.e. time between two consecutive runtime dump collections
-k	SYSTEMKEY		Database user key (to be maintained in hdbuserstore)
-lt	0		Setting this parameter to 1 activates recording of all active threads (target directory: <output_directory>/active_threads_log).
-nc	0		Number of triggered call stacks in case a problem is observed
-ng	0		Number of triggered indexserver gstacks in case a problem is observed
-np	0		Number of triggered kernel profiler traces in case a problem is observed
-nr	0		Number of triggered runtime dumps in case a problem is observed
-od	/tmp/hana_sitter_output		Output directory, i.e. full path of the folder where output directories will end up (will be created in case it doesn't exist)
-oi	3600	s	Online test interval, i.e. the time it waits before it checks if the database is online again
-pi	60	s	Ping interval, i.e. the time it waits before it pings the database again (for responsiveness check)
-pt	60	s	Ping timeout, i.e. the time it waits for response after a ping before the database is considered as unresponsive
-rm	1		Recording mode:
			1 -> execute actions type by type (e.g. at first all runtime dumps, then all kernel profiler traces)
			2 -> execute actions round-robin
			3 -> execute actions concurrently
-so	1		Standard output switch:
			0: script output is not written to the standard output
			1: script output is written to the standard output
-ssl	FALSE		Activation / deactivation of SSL certificate
-tt	60	s	Thread check timeout, i.e. time it waits during thread check before the database is considered as unresponsive
-wp	0	ms	Kernel profiler wait time after call stacks of all active threads have been taken

The following table lists are some examples how to call SAP HANASitter for different purposes

Command	Details
python hanasitter.py	No action
python hanasitter.py -oi 3600 -pt 60 -pi 60 -nr 3 -ir 60 -nc 3 -ic 60 -ct IS_ACTIVE,TRUE,30 -tt 60 -ar 60 -od /tmp/hana_sitter_output -lt 0 -so 1 -k SYSTEMKEY -cpu 0,0,100	Same as default call, but with explicitly provided default values
python hanasitter.py ct IS_ACTIVE,TRUE,30,THREAD_STATE,Running,20,THREAD_STATE,Semaphore-Wait,20,THREAD_METHOD,Wait,30,LOCK_WAIT_NAME,BackupTimeout,10	Same as standard, but trigger data collection additionally in the following cases:
	At least 20 threads in state "Semaphore Wait" or
	At least 30 threads in method "Wait" or
	At least 10 threads waiting for "BackupTimeout"
python hanasitter.py -cpu 5,2,95	Standard behavior and additional trigger of actions when the CPU consumption is at least 95 % during 5 checks with an interval of 2 seconds
python hanasitter.py -lt 1	Standard behavior and additional recording of thread activities in a dedicated file in the active_thread_log subdirectory