A fix is available
APAR status
Closed as program error.
Error description
*************************************************************** * USERS AFFECTED: * PowerHA SystemMirror v7 systems running RSCT 3.1.4.0 or * higher. This level was shipped with AIX 6.1 TL8 and 7.1 TL2 * in 2012, and was also available for download from Fix Central. * NOTE: This problem did not begin having a visible impact in * the field until AIX 6.1 TL9 and 7.1 TL3 (2013), although the * reason for this is not known. The problem involves a timing * factor between the native RSCT code and AIX APIs being used * to obtain cluster information, so slight changes to either * or both layers could have increased the chances of exposure. *************************************************************** * PROBLEM DESCRIPTION: * CAA (Cluster-Aware AIX) system calls are blocking so that * the latest data across all nodes is always reported for * any client query. * * This is in conflict with the RSCT Group Services (hags) * expectation of only allowing non-blocking calls in critical * code paths, and several design decisions were made in hags * when CAA was first being developed on the assumption that * all calls would be non-blocking. * * Visible side-effects of this design conflict did not begin * to occur until cluster protection methods were added to * the CAA environment in RSCT 3.1.4.0, primarily the daemon * monitoring between critical processes in the RSCT stack. * Impacts now include hags mistakenly declaring IBM.ConfigRM * unresponsive, and RMC voting timeouts resulting in the that * subsystem being declared unresponsive. Either situation can * cause a node panic to protect application resources because * hags believes that the RSCT infrastructure is unreliable. * * Regardless of which subsystems are affected, the panic will * display the panic string "RSCT reboot caused by critical * resource protection - Group Services" *************************************************************** * RECOMMENDATION: * An interim fix for the latest AIX levels is available from: * https://ibm.biz/PowerHAFixes * (Ifixes for older levels can be requested from IBM service * on an as-needed basis.) ***************************************************************
Local fix
Problem summary
*************************************************************** * USERS AFFECTED: * PowerHA SystemMirror v7 systems running RSCT 3.1.4.0 or * higher. This level was shipped with AIX 6.1 TL8 and 7.1 TL2 * in 2012, and was also available for download from Fix Central. * NOTE: This problem did not begin having a visible impact in * the field until AIX 6.1 TL9 and 7.1 TL3 (2013), although the * reason for this is not known. The problem involves a timing * factor between the native RSCT code and AIX APIs being used * to obtain cluster information, so slight changes to either * or both layers could have increased the chances of exposure. *************************************************************** * PROBLEM DESCRIPTION: * CAA (Cluster-Aware AIX) system calls are blocking so that * the latest data across all nodes is always reported for * any client query. * * This is in conflict with the RSCT Group Services (hags) * expectation of only allowing non-blocking calls in critical * code paths, and several design decisions were made in hags * when CAA was first being developed on the assumption that * all calls would be non-blocking. * * Visible side-effects of this design conflict did not begin * to occur until cluster protection methods were added to * the CAA environment in RSCT 3.1.4.0, primarily the daemon * monitoring between critical processes in the RSCT stack. * Impacts now include hags mistakenly declaring IBM.ConfigRM * unresponsive, and RMC voting timeouts resulting in the that * subsystem being declared unresponsive. Either situation can * cause a node panic to protect application resources because * hags believes that the RSCT infrastructure is unreliable. * * Regardless of which subsystems are affected, the panic will * display the panic string "RSCT reboot caused by critical * resource protection - Group Services" *************************************************************** * RECOMMENDATION: * An interim fix for the latest AIX levels is available from: * https://ibm.biz/PowerHAFixes * (Ifixes for older levels can be requested from IBM service * on an as-needed basis.) ***************************************************************
Problem conclusion
Changing the nature of the CAA queries to make them non- blocking would be difficult and could not be accomplished any time soon. Instead, Group Services client timeouts are all being recalculated to allow for the necessary CAA query timeouts.
Temporary fix
********* * HIPER * *********
Comments
AIX 6100-09 - use RSCT APAR IV80836 AIX 7100-03 - use RSCT APAR IV80836
APAR Information
APAR number
IV80836
Reported component name
RSCT/RMC FOR CS
Reported component ID
5765F07AP
Reported release
320
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Submitted date
2016-01-26
Closed date
2016-01-26
Last modified date
2019-02-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
RSCT/RMC FOR CS
Fixed component ID
5765F07AP
Applicable component levels
R320 PSY U881677
UP19/02/01 I 1000
PTF to Fileset Mapping
U873162 rsct.core.utils 3.2.0.12
U873165 rsct.core.rmc 3.2.0.11
U873167 rsct.basic.rte 3.2.0.8
U881677 rsct.core.utils 3.2.0.14
U881678 rsct.core.rmc 3.2.0.13
U881679 rsct.basic.rte 3.2.0.10
U876543 rsct.core.utils 3.2.0.13
U876547 rsct.core.rmc 3.2.0.12
U876550 rsct.basic.rte 3.2.0.9
U873162 rsct.core.utils 3.2.0.12
U873165 rsct.core.rmc 3.2.0.11
U873167 rsct.basic.rte 3.2.0.8
U873162 rsct.core.utils 3.2.0.12
U873165 rsct.core.rmc 3.2.0.11
U873167 rsct.basic.rte 3.2.0.8
U876543 rsct.core.utils 3.2.0.13
U876543 rsct.core.utils 3.2.0.13
U876547 rsct.core.rmc 3.2.0.12
U876547 rsct.core.rmc 3.2.0.12
U876550 rsct.basic.rte 3.2.0.9
U876550 rsct.basic.rte 3.2.0.9
U881677 rsct.core.utils 3.2.0.14
U881677 rsct.core.utils 3.2.0.14
U881678 rsct.core.rmc 3.2.0.13
U881678 rsct.core.rmc 3.2.0.13
U881679 rsct.basic.rte 3.2.0.10
U881679 rsct.basic.rte 3.2.0.10
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11O","label":"APARs - AIX 4.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11N","label":"APARs - AIX 5.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11P","label":"APARs - AIX 5.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11M","label":"APARs - AIX 5.2 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
01 February 2019