FAQ: SAP HANA Support Script
SAP HANA Memory Limits
Query:
If left unconfigured, each installed and running HANA instance may use up to 97% (90% in older HANA revisions) of the system’s memory. If multiple unconfigured HANA systems or misconfigured HANA systems are running on the same machine(s) "Out of Memory" situations may occur. In this case the so called "OOM Killer" of Linux gets triggered which will terminate running processes at random and in most cases will kill SAP HANA or GPFS first, leading to service interruption. An unconfigured HANA system is a system lacking a global_allocation_limit setting in the HANA system’s global.ini file. Misconfigured SAP HANA systems are multiple systems running at the same time with a combined memory limit over 90% of the physical installed memory.
Ans:
Please configure the global allocation limit for all systems running at the same time. This can be done by setting the global_allocation_limit parameter in the systems’ global.ini configuration files.Please calculate the combined memory allocation for HANA so that at least 25GB are free for other programs. Please use only the physically installed memory for your calculation.
GPFS parameter readReplicaPolicy
Query:
Older cluster installations do not have the GPFS parameter "readReplicaPolicy" set to "local" which may improve performance in certain cases. Newer cluster installations have this value set and single nodes are not affected by this parameter at all. It is recommended to configure this value.
Ans:
Execute the following command on any cluster node at any time:
# mmchconfig readReplicaPolicy=local
This can be done during normal operation and the change becomes effective immediately for the whole GPFS cluster and is persistent over reboots.
SAP HANA Memory Limit on XS sized Machines
Query:
For a general description of the SAP HANA memory limit see 1: SAP HANA Memory Limits. XS sized servers have only 128GB RAM installed of which even a single SAP HANA system will use up to 93.5% equaling 119GB (older revisions of HANA used 90% = 115GB) if no lower memory limit is configured. This leaves too little memory for other processes which may trigger Out-Of-Memory situations causing crashes.
Ans:
Please configure the global allocation limit for the installed SAP HANA system to a more apropriate value. The recommended value is 112GB if the GPFS page pool size is set to 4GB (see 12: GPFS pagepool should be set to 4GB) and 100GB or less if the GPFS page pool is set to 16GB. If multiple systems are running at the same time, please calculate the total memory allocation for HANA so the sum does not exceed the recommended value. Please use only the physically installed memory for
your calculation.
Overlapping NSDs
Query:
Under some rare conditions single node SSD or XS/S gen 2 models may be installed with overlapping NSDs. Overlapping means that the whole drive (e.g. /dev/sdb) as well as a partition on the same device (e.g. /dev/sdb2) may be configured as NSDs in GPFS. As GPFS is writing data on both NSDs, each NSD will overwrite and corrupt data on the other NSD. In the end at some point the whole device NSD will overwrite the partition table and the partition NSD is lost and GPFS will fail. This is the most common situation where the problem will be noticed.
Consider any data stored in the hana shared filesystem (/sapmnt or /hana) to be corrupted even if the file system check finds no errors.
Ans:
The only solution is to reinstall the appliance from scratch. To prevent installing with the same error again, the single node installation must be completed in phase 2 of the guided installation.
Do not deselect "Single Node Installation".
Missing RPMs
Query:
An upgrade of SAP HANA or another SAP software component fails because of missing dependencies. As some of these package dependencies were added by SAP HANA after your system was initially installed, you may install those missing packages and still receive full support of the IBM Systems solution. If you no longer have the SLES for SAP DVD that had been delivered with your system, you may obtain it again from the SUSE Customer Center.
Ans:
Ensure that the packages listed below are installed on your appliance.
libuuid
gtk2 - Added for HANA Developer Studio
java-1_6_0-ibm - Added for HANA Developer Studio
libicu - Added since revision 48 (SPS04)
mozilla-xulrunner192-* - Added for HANA Developer Studio
ntp
sudo
syslog-ng
tcsh
libssh2-1 - Added since revision 53 (SPS05)
expect - Added since revision 53 (SPS05)
autoyast2-installation - Added since revision 53 (SPS05)
yast2-ncurses - Added since revision 53 (SPS05)
Missing packages can be installed from the SLES for SAP DVD shipped with your appliance using the following instructions. It is possible to add the DVD that was included in your appliance install as a repository and from there install the necessary RPM package. First Check to see if the SUSE Linux
Enterprise Server is already added as an repository:
# zypper repos
# | Alias | Name | Enabled | Refresh
1 | SUSE-Linux-... | SUSE-Linux-... | Yes | No
If it doesn’t exist, please place the DVD in the drive (or add it via the Virtual Media Manager) and add it as a repository. This example uses the SLES for SAP 11 SP1 media.
# zypper addrepo --type yast2 --gpgcheck --no-keep-packages --refresh --check dvd:///?devices=/dev/sr1 "SUSE-Linux-Enterprise-Server-11-SP1_11.1.1"
This is a changeable read-only media (CD/DVD), disabling autorefresh.
Adding repository 'SLES-for-SAP-Applications 11.1.1' [done]
Repository 'SUSE-Linux-Enterprise-Server-11-SP1_11.1.1' successfully added
Enabled: Yes
Autorefresh: No
GPG check: Yes
URI: dvd:///?devices=/dev/sr1
Reading data from 'SUSE-Linux-Enterprise-Server-11-SP1_11.1.1' media
Retrieving repository 'SUSE-Linux-Enterprise-Server-11-SP1_11.1.1' metadata [done]
Building repository 'SUSE-Linux-Enterprise-Server-11-SP1_11.1.1' cache [done]
The drawback of this solution is, that you always have to insert the dvd into the DVD-Drive or mounted via VMM or KVM. Another possibility is to copy the DVD to a local repository and add this repository to zypper. First find out if the local repository is a dvd repo
# zypper lr -u
# | Alias | Name | Enabled | Refresh | URI
1 | SUSE-Linux-Enterprise-Server-11-SP3 11.3.3-1.138 | SUSE-Linux-Enterprise-Server-11-SP3 11.3.3-1.138 | Yes | No | cd:///?devices=/dev/sr0
Copy the DVD to a local Directory
# cp -r /media/SLES-11-SP3-DVD*/* /var/tmp/install/sles11/ISO/
Register the directory as a repository to zypper
# zypper addrepo --type yast2 --gpgcheck --no-keep-packages -f file:///var/tmp/install/sles11/ISO/ "SUSE-Linux-Enterprise-Server-11-SP3"
Adding repository 'SUSE-Linux-Enterprise-Server-11-SP3' [done]
Repository 'SUSE-Linux-Enterprise-Server-11-SP3' successfully added
Enabled: Yes
Autorefresh: Yes
GPG check: Yes
URI: file:/var/tmp/install/sles11/ISO/
For verification you can list the repos again. you should see an output similiar to this:
# zypper lr -u
# | Alias | Name | Enabled | Refresh | URI
1 | SUSE-Linux-Enterprise-Server-11-SP3 | SUSE-Linux-Enterprise-Server-11-SP3 | Yes | Yes | file:/var/tmp/install/sles11/ISO/
2 | SUSE-Linux-Enterprise-Server-11-SP3 11.3.3-1.138 | SUSE-Linux-Enterprise-Server -11-SP3 11.3.3-1.138 | Yes | No | cd:///?devices=/dev/sr0
Then search to ensure that the package can be found. This example searches for libssh.
# zypper search libssh
Loading repository data...
Reading installed packages...
S | Name | Summary | Type
| libssh2-1 | A library implementing the SSH2 ... | package
Then install the package:
# zypper install libssh2-1
Loading repository data...
Reading installed packages...
Resolving package dependencies...
1 new package to install.
Overall download size: 55.0 KiB. After the operation, additional 144.0 KiB will be used.
Continue? [y/n/?] (y):
Retrieving package libssh2-1-0.19.0+20080814-2.16.1.x86_64 (1/1), 55.0 KiB (144.0 KiB unpacked)
Retrieving: libssh2-1-0.19.0+20080814-2.16.1.x86_64.rpm [done]
Installing: libssh2-1-0.19.0+20080814-2.16.1 [done]
CPU Governor set to ondemand
Query:
Linux is using a technology for power saving called "CPU governors" to control CPU throttling and power consumption. By default Linux uses the governor "ondemand" which will dynamically throttle CPUs up and down depending on CPU load. SAP advised to use the governor "performance" as the ondemand governor will impact HANA performance due to too slow CPU upscaling by this governor.
Since appliance version 1.5.53-5 (or simply SLES4SAP 11 SP2 based appliances) we changed the CPU governor to performance. In case of an upgrade you also need to change the governor setting. If you are still running SLES4SAP 11 SP1 based appliances, you may also change this setting to trade in power saving for performance. This performance boost was not quantified by the development team.
Ans:
On all nodes append the following lines to the file /etc/rc.d/boot.local:
bios_vendor=$(/usr/sbin/dmidecode -s bios-vendor)
# Phoenix Technologies LTD means we are running in a VM and governors are not available
if [ $? -eq 0 -a ! -z "${bios_vendor}" -a "${bios_vendor}" != "Phoenix Technologies LTD" ]; then
/sbin/modprobe acpi_cpufreq
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
echo performance > $i
done
fi
The setting will change on the next reboot. You can also change safely the governor settings immediately by executing the same lines at the shell. Copy & paste all the lines at once, or type them one by one.
No diskspace left bug (Bug IV33610)
Query:
Starting HANA fails due to insufficient diskspace. The following error message will be found in indexserver or nameserver trace:
Error during asynchronous file transfer, rc=28: No space left on device.
Using the command ’df’ will show that there is still diskspace left. This problem is due to a bug in GPFS versions between 3.4.0-12 and 3.4.0-20 which will cause GPFS to step into a read-only mode. See 1846872.
Ans:
Make sure to shutdown all HANA nodes by issuing shutdown command from the studio, or login in with ssh using the sidadm user. Then run:
HDB info
to see if there is any HANA processes running. If there are, run
kill -9 proc_pid
to shut them down, one by one.
Download and apply GPFS version 3.4.0.23. In the Operations Guide, refer to the section 7.4: Updating GPFS on page 82 for information about how to upgrade GPFS.
point:It is recommended that you consider upgrading your GPFS version from 3.4 to 3.5 as support for GPFS 3.4 has been discontinued from IBM.
SAP highly recommends that you run uniqueChecker.py script after patching GPFS to make sure that your database is consistent.
Setting C-States
Query:
Poor performance of SAP HANA due to Intel processor settings.
Ans:
As recommended in thes
1824819 SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP2 and
1954788 SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP3
and additionally described in the IBM RETAIN Tip H207000
Linux Ignores C-State Settings in Unified Extensible Firmware Interface (UEFI), the control (’C’) states of the Intel processor should to be turned off for the most reliable performance of SAP HANA.
By default C-States are enabled in the UEFI due to the fact that we set the processor to Customer Mode. With C-States being turned on you might see performance degradations with SAP HANA. We recommend to turn off the processor C-States using the Linux kernel boot parameter:
processor.max_cstate=0
The Linux kernel used by SAP HANA includes a built-in driver (’intel_idle’) which will ignore any C-State limits imposed by Basic Input/Output System (BIOS)/Unified Extensible Firmware Interface (UEFI) when it is active.
This driver may cause issues by enabling C-States even though they are disabled in the BIOS or UEFI. This can cause minor latency as the CPUs transition out of a C-State and into a running state. This is not the preferred state for the SAP HANA appliance and must be changed.
To prevent the ’intel_idle’ driver from ignoring BIOS or UEFI settings for C-States, add the following start parameter to the kernel’s bootloader configuration file:
intel_idle.max_cstate=0
Append both parameters to the end of the kernel command line of your bootloader (/boot/grub/menu.lst) and reboot the server.
Warning: For clustered configurations, this change needs to be done on each server of the cluster. Only make this change when all servers can be rebooted at once, or when you have an active stand-by node to take over the rebooting systems HANA services. Do not try to reboot more servers than stand-by nodes are active
For further information please refer to the SUSE knowledgebase article.
ServeRAID M5120 RAID Adapter FW Issues
Query:
After the initial release of the new X6-based servers (x3850 X6, x3950 X6) a serious issue in various firmware versions of the ServeRAID M5210 RAID adapter has been found which can trigger continuous controller resets. This happens only under heavy load and each controller reset may cause service interruption. Certain firmware versions do not exhibit this issue, but these versions show severely degraded I/O performance. Only servers using the ServeRAID M5120 controller for attaching an external SAS enclosure are affected.
Future appliance versions will be have the workaround for the controller reset issue preinstalled while the performance issue can be only solved by an up- or downgrade to an unaffected firmware version.
Non-exhaustive list of known affected firmware versions:
Controller resets: 23.7.1-0010, 23.12.0-0011, 23.12.0-0016, 23.12.0-0019
Lowered Performance: 23.16.0-0018, 23.16.0-0027
Ans:
The current recommendation is to use firmware version 23.22.0-0024 (or newer, if listed as stable by IBM/Lenovo SAP HANA Team)
and to change the following configuration value in the installed OS. Both can be done after installation.
Changing Queue Depth
On the installed appliance, please edit /etc/init.d/ibm{lenovo}-saphana
and change the lines
function start() {
QUEUESIZE=1024
for i in /sys/block/sd* ; do
if [ -d $i ]; then
echo $QUEUESIZE > $i/queue/nr_requests
fi
done
to this version (if not already set)
function start() {
QUEUESIZE=1024
QUEUEDEPTH=250
for i in /sys/block/sd* ; do
if [ -d $i ]; then
echo $QUEUESIZE > $i/queue/nr_requests
echo $QUEUEDEPTH > $i/device/queue_depth
fi
done
by inserting lines 3 & 7. The new settings will be set on the next reboot or by calling
# service ibm-saphana start
Please ignore any output.
Use recommended Firmware version
Check which FW Package Build is installed on all M5120 RAID controllers:
# /opt/MegaRAID/storcli/storcli64 -AdpAllInfo -aAll | grep 'M5120' -B 5 -A 3
Adapter #1
Versions
Product Name : ServeRAID M5120
Serial No : xxxxxxxxxx
FW Package Build: 23.22.0-0024
Currently, version 23.22.0-0024 is recommended. Download the 23.22.0-0024 FW package for ServeRAID 5100 SAS/SATA adapters via IBM fixcentral or use following direct link: ibm.biz/BdRatD .
Make the downloaded file executable and then run it:
chmod +x ibm_fw_sraidmr_5100-23.22.0-0024_linux_32-64.bin
./ibm_fw_sraidmr_5100-23.22.0-0024_linux_32-64.bin -s
3. Please reboot the server after updating all M5120 controllers.
4. After reboot: Check if the queue depth is set to 250 for all devices on M5120 RAID controller:
for dev in $(lsscsi |grep -i m5120 |grep -E -o '/dev/sd[a-z]+'| cut -d '/' -f3) ; do cat /sys/block/${dev}/device/queue_depth ; done
GPFS Parameter enableLinuxReplicatedAIO
With GPFS version 3.5.0-13 the new GPFS parameter enableLinuxReplicatedAIO was introduced.
Please note the following:
Single node installations:
Single node installations are not affected by this parameter. It can be set to "yes" or "no".
Cluster installations:
GPFS 3.5.0-13 - 3.5.0-15: The parameter must be set to "no". When upgrading to GPFS 3.5.0-16 or higher you have to manually set the value to "yes".
Warning: Instead of setting the parameter to "no" we recommend to upgrade GPFS to 3.5.0-16 or higher.
GPFS 3.5.0-16 or higher: The parameter must be set to "yes".
DRcluster installations:
The parameter must be set to "yes".
The support script (saphana-support-ibm.sh) checks if the parameter is set correctly. If it is not set correctly, adjust the setting:
# mmchconfig enableLinuxReplicatedAIO=no
# mmchconfig enableLinuxReplicatedAIO=yes
GPFS NSD on Devices with GPT Labels
Query:
In some very rare occasions GPFS NSDs may be created on devices with a GUID Partition Tables (GPT). When the NSD is created parts of the primary GPT header are overwritten. Newer UEFI firmware releases offer an option to repair damaged GPTs and if activated the UEFI may try to recover the primary GPT from the backup copy during boot-up. This will destroy the NSD and lead to the loss of all data in the GPFS filesystem.
Ans:
If the support script pointed you to this FAQ entry, please contact IBM Support via SAP’s OSS Ticket System and put the message on the Queue BC-OP-LNX-LENOVO. Please prepare a support script dump as described in
1661146 - Lenovo/IBM Check Tool for SAP HANA appliances
The IBM/Lenovo support will then devise a solution for your installation.
When the ASU tool is installed, run the command
# /opt/lenovo/toolscenter/asu/asu64 show | grep -i gpt
If the Lenovo System Solution for SAP HANA Platform Edition was installed with an ISO image below version 1.9.96-13, the ASU tool will reside in directory: /opt/ibm/toolscenter/asu
The setting has various names, but any variable named GPT and Recovery should be set to "None". If it is set to "Automatic" do not reboot the system. If there is no such setting, do not upgrade the UEFI firmware until the GPTs have been cleared.
Use the installed ASU tool to change the GPT recovery parameter to "None" and reboot the system afterwards.
Assuming that "asu64 show | grep -i gpt" returned "DiskGPTRecovery.DiskGPTRecovery=Automatic" the command would be:
# /opt/lenovo/toolscenter/asu/asu64 set DiskGPTRecovery.DiskGPTRecovery None
As second option you may download and install ASU tool on another server and modify the UEFI settings via remote IMM access. Please download the ASU tool via www-947.ibm.com/support/entry/portal/docdisplay?lndocid=lnvo-asu and consult the ASU documentation for further details.
Or boot into UEFI and complete the following steps:
1. Reboot the server.
2. When the prompt <F1> Setup is displayed, press F1
3. From the setup utility main menu, select System Settings > Recover and RAS > Disk GPT Recovery.
4. Change Disk GPT Recover to <None>.
5. Exit and save settings.
GPFS pagepool should be set to 4GB
Query:
GPFS in your appliance is configured to use 16GB RAM for its so called pagepool. Recent tests showed that the size of this pagepool can be safely reduced to 4GB which will yield 12GB of memory for other running processes. Therefore it is recommended to change this parameter on all appliance installations and versions. Updated versions of the support script will warn if the pagepool size is not 4GB and will refer to this FAQ entry.
Ans:
Please change the pagepool size to 4GB. Execute
# mmchconfig pagepool=4G
to change the setting cluster-wide. This means this command needs to be run only once on Single Node and clustered installation.
The pagepool is allocated during the startup of GPFS, so a GPFS restart is required to activate the new setting. Please stop HANA and any processes that access GPFS filesystems before restarting GPFS. To restart GPFS execute
# mmshutdown
# mmstartup
In clusters all nodes need to be restarted. You can do this one node at a time or restart all nodes at once by adding the parameter -a to both commands. In the latter case please make sure no program is accessing GPFS filesystems on any node.
To verify the configured pagepool size run
# mmlsconfig | grep pagepool
To verify the current active pagepool size run
# mmdiag --config
and search for the pagepool line. This value is shown in bytes.
Limit Page Cache Pool to 4GB
Query:
SLES offers an option to limit the size of the page cache pool. Per default the page cache size is umlimited. SAP recommends in 1557506 - Linux paging improvements to limit this page cache to 4GB of RAM. This may improve resilience against Out-Of-Memory events.
Future appliance software versions will set this value by default. RHEL does currently not offer this option.
Ans:
Add the following line to file /etc/sysctl.conf:
vm.pagecache_limit_mb = 4096
and run
# systctl -e -p
to activate this value without a reboot. This change can be done without a downtime.
restripeOnDiskFailure and start-disks-on-startup
GPFS 3.5 and higher come with the new parameter restripeOnDiskFailure. The GPFS callback script start-disks-on-startup automatically installed on the Lenovo Solution is superseded by this parameter – IBM GPFS NSDs are automatically started on startup when restripeOnDiskFailure is activated.
On DR cluster installations, neither the callback script nor restripeOnDiskFailure should be activated.
Ans:
To enable the new parameter on all nodes in the cluster execute:
# mmchconfig restripeOnDiskFailure=yes -N all
To remove the now unnecessary callback script 'start-disks-on-startup' execute:
# mmdelcallback start-disks-on-startup
Rapid repair on GPFS 4.1
"Rapid repair" is a new functionality introduced in IBM GPFS 4.1, which enables replication on block level. As a result replication time is reduced considerably.
If you are running GPFS 4.1.0 to including GPFS 4.1.1-1:
- It is unsafe to have rapid repair enabled!
- Upgrade to GPFS 4.1.1-2 or higher as soon as possible.
- If an upgrade is not possible at the moment, disable rapid repair temporarily until you upgraded to GPFS 4.1.1-2. See procedure below.
If you are running GPFS 4.1.1-2 or higher:
- It is safe to enable rapid repair.
- Rapid repair brings performance improvements. Enable it by following the procedure below.
Before enabling or disabling rapid repair, SAP HANA must be stopped and all GPFS filesystems unmounted. There must not be any filesystem access while changing this setting!
# mmdsh service sapinit stop # Stop HANA on all nodes
# mmdsh killall hdbrsutil # Stop this process on all nodes
# mmumount all -a # Unmount all GPFS filesystems on all nodes
If the mmumount command failes, there are still processes accesing the shared filesystem. Stop them, then try unmounting the filesystem again.
For enabling rapid repair please use this command (where fs is e.g. sapmntdata):
# mmchfs <fs> --rapid-rerpair
For disabling please use this command:
# mmchfs <fs> --norapid-repair
After this you can mount the GPFS filesystem and start HANA again:
# mmmount all -a
Parameter changes for performance improvements
With release 1.10.102-14 some parameters were changed to improve the performance. These changes should also be implemented on appliances that were set up with older installation media.
1) sysctl parameter vm.min_free_kbytes:
Add the line vm.min_free_kbytes = 2097152 to file /etc/sysctl.conf. Then reload the sysctl settings via:
# sysctl -e -p
2) IBM GPFS log file size (only applicable on GPFS based installations):
Update GPFS to at least version 4.1.1-2, then run the command to increase the log file size to 512MB:
# mmchfs sapmntdata -L 512M
If your GPFS filesystem is called differently, replace sapmntdata by the correct name.
A restart of the GPFS daemon on every node in the GPFS cluster is mandatory to apply the changes.
3) IBM GPFS ignorePrefetchLUNCount parameter (only applicable on GPFS based installations):
Update GPFS to at least version 4.1.1-2, then run the command to enable the parameter:
# mmchconfig ignorePrefetchLUNCount=yes
GPFS 4.1.1-3 behaviour change
Query:
This entry is only valid DR-enabled clusters with a dedicated quorum node. The support script will issue a warning on all these setups regardless of the installed GPFS version. Please blacklist the particular check to silent the warning.
In GPFS version 4.1.1-3 the cluster manager appointment behaviour in split-brain situations changed. In GPFS version 4.1.1-2 and earlier the cluster manager node must to be located at the passive/secondary site, while starting with GPFS version 4.1.1-3 the active/primary site must contain the cluster manager node. Customers updating from pre-4.1.1-3 versions must relocate the cluster manager when upgrading GPFS to 4.1.1-3 or later.
Ans:
When upgrading to GPFS 4.1.1-3 appoint a quorum node on the primary site as the cluster manager. This is a one time change and can be done at any time before, during or after the GPFS upgrade and will not interrupt normal operation.
Verify the location of the cluster manager:
# mmlsmgr
and set the cluster manager to any node on the primary site which designated as a quorum node. To get a list of nodes execute
# mmlscluster
To change the cluster manager node run
# mmchmgr -c <node>
To silent the warning execute
echo check_dr_gpfs_4_1_1_3 >> /etc/lenovo/supportscript_check_blacklist
Setting the HANA Parameters
Query:
You have upgraded SAP HANA to version 10 or later or your SAP HANA System version 10 or later was installed with an older image and still uses the previous recommended values for the HANA parameters. Make sure to have the HANA parameters at the recommended values.
The recommended values for SAP HANA Version 10 and later are:
- async_read_submit=on
- async_write_submit_active=auto
- async_write_submit_blocks=all
Ans:
login to an HANA Server with user <sid>adm and run the following commands:
hdbparam --paramget fileio.async_read_submit
hdbparam --paramget fileio.async_write_submit_active
hdbparam --paramget fileio.async_write_submit_blocks
If the values returned by these commands differ from the recommended values you can set the parameters with the following commands:
hdbparam --paramset fileio.async_read_submit=on
hdbparam --paramset fileio.async_write_submit_active=auto
hdbparam --paramset fileio.async_write_submit_blocks=all