Troubleshooting
Problem
Host report rebooting
Symptom
One or both N3001-001 hosts rebooting unexpectedly
Cause
Power going to host power supplies is interrupted
Environment
N3001-001 IBM Netezza PureData System for Analytics
Diagnosing The Problem
Check the output of # last reboot following to discover frequency of the reboots.
[root@nzdev01 ~]# last reboot
system boot 2.6.32-431.17.1. Thu Feb 18 03:05 - 15:57 (12:51)
system boot 2.6.32-431.17.1. Wed Feb 17 07:07 - 15:57 (1+08:50)
system boot 2.6.32-431.17.1. Tue Feb 16 02:32 - 15:57 (2+13:24)
Check ipmitool sel list for information, warnings or errors around the time of the reboots
Power Supply #0x70 | Power Supply AC lost | Asserted
Note that if power from both power supplies is lost at the time time the system will not be able to record the power loss of both power supplies. If the timing is just right the system may not be able to record either power supply loss but this is rare.
Look for any environmental causes for power loss, such as the data center having power problems.
If there is a lack of messages in the host /var/log/messages about a fencing event and there are power loss information in the ipmi log, suspect an external power cause.
If the host is connected to a UPS find out if the UPS is doing a self test at this time. It is possible that a weak battery in the UPS during a self testing period could not hold up the power requirement of the host and the host will reboot.
This can be spotted fairly easily if the time of day of the reboots is the same or very close to the same day.
Resolving The Problem
If the host is not connected to a UPS it would be highly advised to obtain a UPS for power consistency.
If the host is connected to a UPS, have the health of the UPS checked or is a smart UPS check any logs that might be available.
If not connected to a UPS and there is a UPS available this can be used to troubleshoot that the issue is a incoming power issue and not a power supply issue.
If a UPS is not available suspect that the PDU may have an issue.
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21979502