IBM Support

CPU temperature is too high

Troubleshooting


Problem

nzhealtcheck report SHC900 CPU temperature is too high

Symptom

Nzhealthcheck report shows "CPU temperature is too high" just like the following:


Rule : SHC900
Issue Detected : CPU temperature is too high
Severity : High
Components : rack1.host1.cpu1 (from ibm_host) - State is
rack1.host1.cpu2 (from ibm_host) - State is
rack1.host1.cpu3 (from ibm_host) - State is
rack1.host1.cpu4 (from ibm_host) - State is
rack2.host1.cpu1 (from ibm_host) - State is
rack2.host1.cpu2 (from ibm_host) - State is
rack2.host1.cpu3 (from ibm_host) - State is
rack2.host1.cpu4 (from ibm_host) - State is

Expert's Advice :

Please check your environment status, including server room temperature.
If server room temperature is correct, and the problem persists,
please contact IBM Support.

Cause


Intel Xeon 51xx and higher CPUs do not have a thermal diode to query CPU temperature.
Instead, Intel uses a digital thermal sensor and Platform Environment Control Interface (PECI) clock readings to track CPU thermal events.

However, the PECI technology does not permit translating PECI data into a numeric temperature value.
As such, numeric temperature values are not displayed.

Resolving The Problem

To check if the issue is really manifesting, also gather the output of the following commands:

As ROOT account

# ipmitool sdr elist

Do this on both hosts and look for the Ambient Temp and CPU-related rows.


HA1
Ambient Temp     | 32h | ok  | 12.1 | 23 degrees C
CPU 1 Temp | 98h | ok | 3.1 |
CPU 2 Temp | 99h | ok | 3.2 |
CPU 3 Temp | 9Ah | ok | 3.3 |
CPU 4 Temp | 9Bh | ok | 3.4 |

HA2
Ambient Temp     | 32h | ok  | 12.1 | 23 degrees C
CPU 1 Temp | 98h | ok | 3.1 |
CPU 2 Temp | 99h | ok | 3.2 |
CPU 3 Temp | 9Ah | ok | 3.3 |
CPU 4 Temp | 9Bh | ok | 3.4 |

Note:

As you can see in the example above, the numerical values as well the the status (ok) are reflected; proving that the CPU-related temperature in the nzhealthcheck report as false-positive.

Related Information

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Host","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21982911