Troubleshooting
Problem
The server begins to power on then suddenly powers off or unexpectedly powers off or does not power on with one (1) or more of the following observed conditions (product dependent): LightPath display panel: Checkpoint alternates between FA and XX. (where: "XX" is another set of characters) BRD lightPath LED may be illuminated. Fault Light Emitting Diode (LED) may be illuminated. Integrated Management Module (IMM) event: Sensor 'CPU x VRD' has transitioned to non-recoverable. (where: x can be 1,2, 3, or 4) Sensor 'CPU CACHE VRD' has transitioned to non-recoverable. Sensor 'system board Fault' has transitioned to critical from a less severe state. Sensor 'CPU x y VIO' has transitioned to non-recoverable. (where: x, y can be 2, 3or 3 , 4 as seen on the IBM System x3850 X5 and x3950 X5) Advanced Management Module (AMM) event: system board voltage fault *** IMPORTANT (when symptom is encountered) *** Do not cycle the power, or reseat blades and attempt a power onwithou
Resolving The Problem
Source
RETAIN tip: H207008
Symptom
The server begins to power on then suddenly powers off or unexpectedly powers off or does not power on with one (1) or more of the following observed conditions (product dependent):
LightPath display panel:
- Checkpoint alternates between FA and XX. (where: "XX" is another set of characters)
- BRD lightPath LED may be illuminated.
- Fault Light Emitting Diode (LED) may be illuminated.
- Integrated Management Module (IMM) event:
- Sensor 'CPU x VRD' has transitioned to non-recoverable. (where: x can be 1, 2, 3, or 4)
- Sensor 'CPU CACHE VRD' has transitioned to non-recoverable.
- Sensor 'system board Fault' has transitioned to critical from a less severe state.
- Sensor 'CPU x y VIO' has transitioned to non-recoverable.
(where: x, y can be 2, 3 or 3, 4 as seen on the IBM System x3850 X5 and x3950 X5)
Advanced Management Module (AMM) event:
- system board voltage fault
*** IMPORTANT (when symptom is encountered) ***
Do not cycle the power, or reseat blades and attempt a power on without Product Engineering approval.
Affected Configurations
The system can be any of the following IBM servers:
- BladeCenter HS22V, Type 1949, any model
- BladeCenter HS22V, Type 7871, any model
- BladeCenter HX5, Type 1909, any model
- BladeCenter HX5, Type 1910, any model
- BladeCenter HX5, Type 7872, any model
- BladeCenter HX5, Type 7873, any model
- System x3550 M3, Type 4254, any model
- System x3550 M3, Type 7944, any model
- System x3650 M3, Type 4255, any model
- System x3650 M3, Type 7945, any model
- System x3850 X5, Type 7143, any model
- System x3850 X5, Type 7145, any model
- System x3850 X5, Type 7146, any model
- System x3850 X5, Type 7191, any model
- System x3950 X5, Type 7143, any model
- System x3950 X5, Type 7145, any model
This tip is not software specific.
This tip is not option specific.
The system has the symptom described above.
Solution
If the server fails to power on and continues to exhibit the documented symptoms, replace the micro-processor board or system board.
Contact the IBM Service Provider or the appropriate Support Center for the corresponding geography:
For instance, in the U.S., contact 800-IBM-SERV at 800-426-7378.
Note: The IBM Directory of Worldwide Contacts is available from the following URL:
http://www.ibm.com/planetwide/
Workaround
To help prevent the symptoms from occurring:
- When possible, reduce the number of AC and DC power cycles. In addition for x3850 X5 and HX5, and when possible, reduce the number of restarts.
- Prevent the 'intel_idle' driver from loading (if following
operating systems will be, or are being used - see following
note):
Some versions of Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) distributions have a built in driver ('intel_idle') which by default will ignore any C-state limits set by or in Basic Input/Output System (BIOS) or Unified Extensible Firmware Interface (UEFI).
Add/Edit the kernel statement shown in the following to the bootloader configuration file to prevent the 'intel_idle' driver from loading and to use the UEFI settings for C-State limit:
intel_idle.max_cstate=0
For more details, refer to RETAIN Tip H207000 (MIGR-5091901) at the following URL:
http://www.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5091901
- Update the Unified Extensible Firmware Interface (UEFI) per the
following product list:
Note: Updating UEFI firmware by itself will not change the Advanced Configuration and Power Interface (ACPI) C-State limits needed to resolve this issue (see Details section). To insure the ACPI C-state is set correctly after updating UEFI to the new level, either load Defaults, or set the proper ACPI C-state Limit by continuing the workaround actions.
- IBM BladeCenter HS22V (1949, 7871): Version 1.19 Build ID: P9E158A
- IBM BladeCenter HX5 (1909, 1910, 7872, 7873): Version 1.77 Build ID: HIE177A
- IBM System x3850 X5 (7143, 7145, 7146, 7191): Version 1.77 Build ID: G0E177A
- IBM System x3550 M3 (4254, 7944, 4255, 7945): Version 1.17 Build: D6E159A
- IBM System x3650 M3 (4254, 7944, 4255, 7945): Version 1.17
Build: D6E159A
The file is available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL: http://www.ibm.com/support/fixcentral/
- Set the ACPI C2-State limit using one (1) of the following two
(2) methods:
Method 1 - F1 Setup:- Enter UEFI Setup by pressing F1 after the IBM System x Server Firmware logo screen appears when the system is powered on or restarted.
- Select System Settings --> Operating Modes.
- Change the Operating Mode to Custom Mode.
- Select System Settings --> Processors.
- Set the ACPI C-state Limit to ACPI C2.
- Press Escape (Esc) three (3) times, press 'Y' to save the
settings and restart the server.
Note: Servers that do not have the 'ACPI C-state Limit' menu selection effectively have the ACPI C-state limited to ACPI C2 by default. Although no action is needed in F1 Setup, the 'intel_idle' driver should be prevented from running if applicable as described previously.
Method 2 - Advanced Settings Utility (ASU):
- Install IBM ASU locally (alternately: run ASU remotely).
http://www.ibm.com/support/entry/portal/docdisplay?lndocid=tool-asu
- Execute the following ASU commands from a command prompt:
- asu64 set UEFI.OperatingModes "Custom
Mode"
Some systems may use an alternate ASU command: -
- asu64 set OperatingModes.ChooseOperatingMode "Custom Mode"
- asu64 set OperatingModes.ChooseOperatingMode "Custom Mode"
- asu64 set UEFI.PackageCState "ACPI C2"
Some systems may use an alternate ASU command:- asu64 set Processors.PackageACPIC-StateLimit "ACPI C2"
Note: If the the Workaround steps are performed on a UEFI version previous to what is listed in Step 3 and UEFI default settings are ever reloaded, the
Workaround steps will have to be repeated. - asu64 set UEFI.OperatingModes "Custom
Mode"
Additional Information
In rare cases, processor Voltage Regulator Device (VRD) faults have been observed when a processor transitions between C-state 0 (full power) and deep C-states.
CPU VRD faults that occur due to state transitions can be reduced greatly or eliminated by having UEFI limit how deep of a C-state is allowed.
The listed UEFI versions change the default settings to set ACPI C-state limit to C2 as the default, for processors that support this setting/function.
Notes:
1. IBM Servers are designed to perform optimally in a steady state
power environment. Excessive AC and DC power cycles can stress
system components which may lead to pre-mature failure of the
VRDs.
IBM System x3850 X5 and HX5 Blades perform DC power cycles during any system restart or warm boot. For example, a DC power cycle occurs following a 'Ctrl+Alt+Del' or operating system (OS) restart. In addition to AC and DC power cycles, system restarts (warm boots) should be avoided where possible on IBM System x3850 X5 and HX5 Blades.
2. Enabling C-states in UEFI Setup maps operating system ACPI C-state requests to Intel idle states to reduce idle processor power consumption. On IBM X5 family servers with Intel E7 processors, ACPI C1, C2, C3 map to Intel C1, C3, C6 states. On IBM X5 family servers with Intel 6500/7500 processors, ACPI C1, C3 map to Intel C1, C3 states and ACPI C2 is not available. OS software may over-ride the UEFI ACPI mapping, e.g. the intel_idle driver in Linux kernels invokes Intel idle states directly.
3. Just flashing the UEFI to the new level will not change the ACPI C-state limit needed because the UEFI settings are preserved between flash updates by design, and new defaults do not get automatically loaded after a UEFI update. To insure the ACPI C-state limit is set correctly after updating UEFI to the new level, either Load Defaults, or use the Workaround procedure to set the proper ACPI C-state.
4. Some newer versions of Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) distributions have a built in driver ('intel_idle') which will ignore any C-state limits imposed by Basic Input/Output System (BIOS)/Unified Extensible Firmware Interface (UEFI).
Add the kernel statement shown in the quotes below to the bootloader configuration file to prevent the 'intel_idle' driver from loading and to use the UEFI settings for C-State limit:
intel_idle.max_cstate=0 |
For more details, refer to RETAIN Tip H207000 (MIGR-5091901) at the following URL:
http://www.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5091901
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
19 April 2023
UID
ibm1MIGR-5091926