How can we automatize conditioned capturing of specific SAP HANA information (e.g. triggering of a runtime dump when a significant amount of threads is blocked)?
SOLUTION
For analyzing some problems SAP HANA side such as a non-accessible database (SAP Doc 1999020), bad performance (SAP Doc 2000000), lock contention (SAP Doc 1999998) or high CPU consumption (SAP Doc 2100040) more efficiently and easily it is sometimes required to collect information at the time when the problem exists. Sometimes we forget this information or the situation already resolved when traces or dumps gets triggered.
SAP HANASitter is used for configuring reaction methods such as dumps or collection of performance histories when specific conditions like high-loads are met.
Features of SAP HANASitter:
- SAP HANASitter is a generic tool which replaces all the individual tools like thrloop
- You can implement SAP HANASitter via Python script.
- This script is an expert tool which is designed by SAP. The users are allowed to use it but can't help SAP responsible if any problem originates by the use of this tool.
In order to install SAP HANASitter please follow the steps below:
- First please download the attached script hanasitter.py
- And copy it to a directory on your SAP HANA database server
Once the SAP HANASitter gets installed, you can start it.
The below command provides you with an overview of SAP HANASitter working and configuration options the available.
- python hanasitter.py --help
Note: If SAP HANASitter is called without any options, it will do nothing therefore the user always have to provide a set of options based on their requirements.
The following command line options exist to adjust the behaviour:
Option | Default | Unit | Details |
-ar | -1 | s | check interval, if negative it exits |
-cpu | 0,0,100 | -,s,% | Comma-separated list of three values: |
| |||
| |||
| |||
-ct | IS_ACTIVE,TRUE,30 | Definition of critical thread situation consisting of three comma-separated values: | |
| |||
| |||
| |||
| |||
If blanks are part of one value (e.g. in case of checking for value 'Semaphore Wait'), you have to replace the blanks with a minus (e.g. 'Semaphore-Wait'). | |||
Multiple conditions can be concatenated, they are then evaluated with OR, e.g. "IS_ACTIVE,TRUE,30,THREAD_STATE,Semaphore-Wait,10" means that actions are triggered when there are at least 30 active threads or at least 10 threads with state "Semaphore Wait". | |||
-dp | 60 | s | Kernel profiler trace duration, i.e. length of traced time frame |
-ic | 60 | s | Call stack interval, i.e. time between two consecutive call stack collections |
-ig | 60 | s | Indexserver gstack interval, i.e. time between two consecutive index server gstack collections |
-ip | 60 | s | Kernel profiler trace interval, i.e. time between two consecutive kernel profiler traces |
-ir | 60 | s | Runtime dump interval, i.e. time between two consecutive runtime dump collections |
-k | SYSTEMKEY | Database user key (to be maintained in hdbuserstore) | |
-lt | 0 | Setting this parameter to 1 activates recording of all active threads (target directory: <output_directory>/active_threads_log). | |
-nc | 0 | Number of triggered call stacks in case a problem is observed | |
-ng | 0 | Number of triggered indexserver gstacks in case a problem is observed | |
-np | 0 | Number of triggered kernel profiler traces in case a problem is observed | |
-nr | 0 | Number of triggered runtime dumps in case a problem is observed | |
-od | /tmp/hana_sitter_output | Output directory, i.e. full path of the folder where output directories will end up (will be created in case it doesn't exist) | |
-oi | 3600 | s | Online test interval, i.e. the time it waits before it checks if the database is online again |
-pi | 60 | s | Ping interval, i.e. the time it waits before it pings the database again (for responsiveness check) |
-pt | 60 | s | Ping timeout, i.e. the time it waits for response after a ping before the database is considered as unresponsive |
-rm | 1 | Recording mode: | |
| |||
| |||
| |||
-so | 1 | Standard output switch: | |
| |||
| |||
-ssl | FALSE | Activation / deactivation of SSL certificate | |
-tt | 60 | s | Thread check timeout, i.e. time it waits during thread check before the database is considered as unresponsive |
-wp | 0 | ms | Kernel profiler wait time after call stacks of all active threads have been taken |
The following table lists are some examples how to call SAP HANASitter for different purposes
Command | Details |
python hanasitter.py | No action |
python hanasitter.py -oi 3600 -pt 60 -pi 60 -nr 3 -ir 60 -nc 3 -ic 60 -ct IS_ACTIVE,TRUE,30 -tt 60 -ar 60 -od /tmp/hana_sitter_output -lt 0 -so 1 -k SYSTEMKEY -cpu 0,0,100 | Same as default call, but with explicitly provided default values |
python hanasitter.py ct IS_ACTIVE,TRUE,30,THREAD_STATE,Running,20,THREAD_STATE,Semaphore-Wait,20,THREAD_METHOD,Wait,30,LOCK_WAIT_NAME,BackupTimeout,10
| Same as standard, but trigger data collection additionally in the following cases: |
| |
| |
| |
python hanasitter.py -cpu 5,2,95 | Standard behavior and additional trigger of actions when the CPU consumption is at least 95 % during 5 checks with an interval of 2 seconds |
python hanasitter.py -lt 1 | Standard behavior and additional recording of thread activities in a dedicated file in the active_thread_log subdirectory |