APAR status
Closed as program error.
Error description
The replica catalog servers do not recover after a network outage when failure recovery is suspended and then resumed. For example, failover can be suspended with the following xscmd command: xscmd -c suspend -t failoverAll The replica catalog servers log the following FFDC. Exception:java.lang.IllegalStateException SourceId:com.ibm.ws.objectgrid.replication.StaticReplicationGrou pMemberService.processRGMLogSequences ProbeId:1788 Reporter:java.util.Collections$SynchronizedMap@3b08216d java.lang.IllegalStateException: Transaction FFFC9600-214D-40D2-E000-00000A90216A was not committed because it did not map to a session. at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.processRGMLogSequences(CommonReplicationGroupMemberService .java:1738) at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.dispatchMessage(CommonReplicationGroupMemberService.java:7 29) at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.run(CommonReplicationGroupMemberService.java:873) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable$2.ru n(SecurityContextRunnable.java:111) at com.ibm.ws.security.auth.ContextManagerImpl.runAs(ContextManager Impl.java:5474) at com.ibm.ws.security.auth.ContextManagerImpl.runAsSystem(ContextM anagerImpl.java:5600) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.runW ithServerContext(SecurityContextRunnable.java:109) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.run( SecurityContextRunnable.java:68) at java.lang.Thread.run(Thread.java:724)
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: WebSphere eXtreme Scale users suspending * * and * * resuming failover. * **************************************************************** * PROBLEM DESCRIPTION: Catalog server failure occurs after a * * network outage, when failure recovery * * is suspended. * **************************************************************** * RECOMMENDATION: * **************************************************************** The replica catalog servers do not recover after a network outage when failure recovery is suspended and then resumed after the outage. For example, fail over can be suspended with the following xscmd command: xscmd -c suspend -t failoverAll The replica catalog servers log the following FFDC. Exception:java.lang.IllegalStateException SourceId:com.ibm.ws.objectgrid.replication.StaticReplicationGrou pMemberService.processRGMLogSequences ProbeId:1788 Reporter:java.util.Collections$SynchronizedMap@3b08216d java.lang.IllegalStateException: Transaction FFFC9600-214D-40D2-E000-00000A90216A was not committed because it did not map to a session. at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.processRGMLogSequences(CommonReplicationGroupMemberService .java:1738) at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.dispatchMessage(CommonReplicationGroupMemberService.java:7 29) at com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe rvice.run(CommonReplicationGroupMemberService.java:873) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable$2.ru n(SecurityContextRunnable.java:111) at com.ibm.ws.security.auth.ContextManagerImpl.runAs(ContextManager Impl.java:5474) at com.ibm.ws.security.auth.ContextManagerImpl.runAsSystem(ContextM anagerImpl.java:5600) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.runW ithServerContext(SecurityContextRunnable.java:109) at com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.run( SecurityContextRunnable.java:68) at java.lang.Thread.run(Thread.java:724) Further exceptions can occur and the replica catalog becomes unresponsive.
Problem conclusion
When failover resumed, the data was not correctly replicated to the replica catalog servers, resulting in uncommitted transactions. The code was fixed to replicate correctly after failover is resumed.
Temporary fix
Comments
APAR Information
APAR number
PI43426
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
860
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2015-06-19
Closed date
2015-07-15
Last modified date
2015-07-15
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702
Applicable component levels
R860 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
15 July 2015