A fix is available
APAR status
Closed as program error.
Error description
************************************************************** * USERS AFFECTED: * Systems running PowerHA System Mirror on the * AIX 7100-05 Technology Level or * AIX 7200-02 Technology Level with * rsct.basic.rte at 3.2.3.0. ************************************************************** * PROBLEM DESCRIPTION: * An improvement in obtaining adapter state information from * AHAFS event responses introduced some errors in handling * internal tracking of monitored IP addresses. * * This can result in a core dump of the hagsd process any time * an IP change occurs at the OS layer. This means the failure * cannot happen while a cluster is running stable with no * changes occurring, but it is a risk during startup, shutdown, * or a failover scenario, and cannot be predicted beyond that. * * Two possible core dump stacks which have been seen as a * result of this issue are as follows: * * (dbx) where * SGroup.SGroup::RemoveRealProvider()(), * line 4516 in SGroup.C * SGroup.SGroup::RemoveRealProvider()(), * line 3193 in SGroup.C * SFailureProtocol::RemoveChangingMembersFromGroup()(), * line 1169 in SFailureProtocol.C * SFailureProtocol::ApprovedPostBroadcast()(), * line 552 in SFailureProtocol.C * SVProtocol::ExecutePostBroadcast()(), * line 457 in SVProtocol.C * SFailureProtocol::ExecutePostBroadcast()(), * line 500 in SFailureProtocol.C * SVProtocol::Approved()(), line 597 in SVProtocol.C * SVProtocol::Execute()(), line 397 in SVProtocol.C * SFailureProtocol::Execute()(), * line 462 in SFailureProtocol.C * SProtocolMgr::ExecuteThisProtocol()(), * line 1939 in SProtocolMgr.C * SGroupAdaptMbr::ExecutePendingProtocols()(), * line 1237 in SGroupAdaptMbr.C * SDelayedProtocol::main()(), * line 106 in SDelayedProtocol.h * executeEvent()(), line 306 in DelayedJob.C * DelayedJob::dispatchAll()(), line 431 in DelayedJob.C * DispatchControl::Dispatcher()(), * line 681 in DispatchControl.C * main(), line 875 in pgsd.C * * (dbx) where * FindHashLoc()(), line 131 in hash.C * Hash_insert()(), line 270 in hash.C * hb_caa_update_global_tbl()(), * line 3073 in hb_communication.C * AHAFSConfigurationHandler:: * update_global_table_and_construct_events()(), * line 110 in CAA_AHAFSConfigurationHandler.C * unnamed block in AHAFSIPChangeEventHandler::handler()(), * line 188 in CAA_AHAFSIPChangeEventHandler.C * AHAFSIPChangeEventHandler::handler()(), * line 188 in CAA_AHAFSIPChangeEventHandler.C * unnamed block in AHAFSHandler::dispatch()(), * line 189 in CAA_AHAFSHandler.C * AHAFSHandler::dispatch()(), * line 189 in CAA_AHAFSHandler.C * hb_get_event_message(), * line 1078 in hb_communication.C * PMRun()(), line 1436 in PMClient.C * PMSocket::HandleInput()(), line 68 in PMSocket.C * DispatchControl::HandleInput()(), * line 1211 in DispatchControl.C * DispatchControl::Dispatcher()(), * line 976 in DispatchControl.C * main(), line 875 in pgsd.C * ************************************************************** * RECOMMENDATION: * Install APAR IJ02843. * Prior to fix availability, an interim fix is available from * either * ftp://aix.software.ibm.com/aix/ifixes/ij02843/ * https://aix.software.ibm.com/aix/ifixes/ij02843/ * Installation of the ifix does not require a reboot; however, * applying the ifix requires PowerHA to be stopped on the node * prior to applying the fix. Resources must be moved to * another node or taken offline (the fix won't install with * unmanage resources). **************************************************************
Local fix
Problem summary
A flaw in handling of monitored IP changes during some adapter state improvements in RSCT 3.2.3.0 has led to the risk of a hagsd core dump in a couple code paths.
Problem conclusion
Transition of IP lists during a monitoring change has been corrected.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
IJ02843
Reported component name
RSCT FOR AIX
Reported component ID
5765F07AP
Reported release
323
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Submitted date
2017-12-22
Closed date
2018-02-14
Last modified date
2021-09-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
U881668
Fix information
Fixed component name
RSCT FOR AIX
Fixed component ID
5765F07AP
Applicable component levels
R323 PSY U889770
UP21/09/02 I 1000
PTF to Fileset Mapping
U881668 rsct.basic.rte 3.2.3.2
U883597 rsct.basic.rte 3.2.3.3
U887098 rsct.basic.rte 3.2.3.5
U885259 rsct.basic.rte 3.2.3.4
U883597 rsct.basic.rte 3.2.3.3
U881668 rsct.basic.rte 3.2.3.2
U881668 rsct.basic.rte 3.2.3.2
U883597 rsct.basic.rte 3.2.3.3
U885259 rsct.basic.rte 3.2.3.4
U885259 rsct.basic.rte 3.2.3.4
U887098 rsct.basic.rte 3.2.3.5
U887098 rsct.basic.rte 3.2.3.5
U889770 rsct.basic.rte 3.2.3.6
U889770 rsct.basic.rte 3.2.3.6
U889770 rsct.basic.rte 3.2.3.6
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11O"},"Platform":[{"code":"PF053","label":"Power Systems"}],"Version":"323"}]
Document Information
Modified date:
03 September 2021