IBM Support

PI43426: Catalog servers fail after a network outage, when failure recovery is suspended.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The replica catalog servers do not recover after a network
    outage when failure recovery is suspended and then resumed.
    
    For example, failover can be suspended with the following
    xscmd command: xscmd -c suspend -t failoverAll
    
    The replica catalog servers log the following FFDC.
    Exception:java.lang.IllegalStateException
    SourceId:com.ibm.ws.objectgrid.replication.StaticReplicationGrou
    pMemberService.processRGMLogSequences ProbeId:1788
    Reporter:java.util.Collections$SynchronizedMap@3b08216d
    java.lang.IllegalStateException: Transaction
    FFFC9600-214D-40D2-E000-00000A90216A was not committed because
    it did not map to a session.
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.processRGMLogSequences(CommonReplicationGroupMemberService
    .java:1738)
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.dispatchMessage(CommonReplicationGroupMemberService.java:7
    29)
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.run(CommonReplicationGroupMemberService.java:873)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable$2.ru
    n(SecurityContextRunnable.java:111)
    at
    com.ibm.ws.security.auth.ContextManagerImpl.runAs(ContextManager
    Impl.java:5474)
    at
    com.ibm.ws.security.auth.ContextManagerImpl.runAsSystem(ContextM
    anagerImpl.java:5600)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.runW
    ithServerContext(SecurityContextRunnable.java:109)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.run(
    SecurityContextRunnable.java:68)
    at java.lang.Thread.run(Thread.java:724)
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  WebSphere eXtreme Scale users suspending    *
    *                  and                                         *
    *                  resuming failover.                          *
    ****************************************************************
    * PROBLEM DESCRIPTION: Catalog server failure occurs after a   *
    *                      network outage, when failure recovery   *
    *                      is suspended.                           *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The replica catalog servers do not recover after a network
    outage when failure recovery is suspended and then resumed after
    the outage.
    For example, fail over can be suspended with the following xscmd
    command: xscmd -c suspend -t failoverAll
    The replica catalog servers log the following FFDC.
    Exception:java.lang.IllegalStateException
    SourceId:com.ibm.ws.objectgrid.replication.StaticReplicationGrou
    pMemberService.processRGMLogSequences ProbeId:1788
    Reporter:java.util.Collections$SynchronizedMap@3b08216d
    java.lang.IllegalStateException: Transaction
    FFFC9600-214D-40D2-E000-00000A90216A was not committed because
    it did not map to a session.
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.processRGMLogSequences(CommonReplicationGroupMemberService
    .java:1738)
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.dispatchMessage(CommonReplicationGroupMemberService.java:7
    29)
    at
    com.ibm.ws.objectgrid.replication.CommonReplicationGroupMemberSe
    rvice.run(CommonReplicationGroupMemberService.java:873)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable$2.ru
    n(SecurityContextRunnable.java:111)
    at
    com.ibm.ws.security.auth.ContextManagerImpl.runAs(ContextManager
    Impl.java:5474)
    at
    com.ibm.ws.security.auth.ContextManagerImpl.runAsSystem(ContextM
    anagerImpl.java:5600)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.runW
    ithServerContext(SecurityContextRunnable.java:109)
    at
    com.ibm.ws.objectgrid.util.security.SecurityContextRunnable.run(
    SecurityContextRunnable.java:68)
    at java.lang.Thread.run(Thread.java:724)
    Further exceptions can occur and the replica catalog becomes
    unresponsive.
    

Problem conclusion

  • When failover resumed, the data was not correctly replicated to
    the replica catalog servers, resulting in uncommitted
    transactions. The code was fixed to replicate correctly after
    failover is resumed.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI43426

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    860

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-06-19

  • Closed date

    2015-07-15

  • Last modified date

    2015-07-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
15 July 2015