IBM Support

Host exceeded average power usage threshold

Technote (troubleshooting)


Problem(Abstract)

Users receive an event alert email from a host (nzevent) running in IBM PureData System for Analytics (Also Known as Netezza/NPS) 100-1 appliance.
This is generated when the system is starting up. (system initiated alert)

Symptom

Receiving the following event messages when the system is starting up.

CRITICAL: NPS system xxxx - host 1004 Needs attention. System initiated.
location:upper host
error string:Host exceeded average power usage threshold
average power usage value: 100 threshold: 95
devSerial:XXXXXXX
event source:System initiated


Cause

The threshold value for average power usage might not be set correctly.


Diagnosing the problem

In order to check if it is caused by hardware issue, user should check the average power usage of host server via AMM or ipmitool command. If the average power usage is within the safe range or reported as OK status, it's not hardware issue and can be resolved by updating the NPS registry parameter.
To retrieve the average power usage of system, user can issue following commands on the system.

1. login as root user on host server, query the average power usage via AMM.

[root@XXXX]# ssh mm001 fuelg -T blade[5]
system> fuelg -T blade[5]
-pme off
PM Capability: Dynamic Power Measurement with capping
Effective CPU Speed: 2399 MHz
Maximum CPU Speed: 2400 MHz
-pcap 959 (min: 598, max: 959)
Maximum Power: 122
Minimum Power: 107
Average Power: 111
Data captured at 03/01/14 16:40:55

2. login as root user on SPU, query the average power usage via ipmitool commnad.

[root@svntz001np ~]# ipmitool sdr | grep -i power
Avg Power | 110 Watts | ok
Host Power | 0x00 | ok
Avg Power 2 | disabled | ns
Avg Power 3 | disabled | ns
Avg Power 4 | disabled | ns
Avg Power 5 | disabled | ns

Resolving the problem

User can update the NPS parameter to a more appropriate value as seen from the AMM or ipmitool commands.
the higher the value used, the fewer warnings will be seen

1. To do this, pause the system as the nz user

nzsystem pause
Are you sure you want to pause the system (y|n)? [n] y

2. Then change the threshold value

nzsystem set -arg "sysmgr.hostPwrAvgThresholdToRiseEvent=115"
Are you sure you want to change the system configuration (y|n)? [n] y

3. Then unpause the system
nzsystem resume

Document information

More support for: PureData System for Analytics

Software version: 1.0.0

Operating system(s): Platform Independent

Reference #: 1667837

Modified date: 25 March 2014