IBM Support

PI40223: Promotion or new replicas are delayed after failover.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Primary shard promotion or new replicas are delayed due to hung
    threads or MessageTimeOutException; for example, you can
    see the following log activity:
    
    4/20/15 13:57:01:449 JST] 000000a1 XSThreadPool  W   CWOBJ7853W:
    Detected a hung thread named "XIOPrimaryPool : 0" TID:c2
    WAITING.  Executing since 4/20/2015 13:56:36:169 +0900.
    Stack Trace:
    sun.misc.Unsafe.park(Native Method)
    
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:197
    )
    
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
    bject.await(AbstractQueuedSynchronizer.java:2054)
    
    com.ibm.ws.xs.xio.actor.impl.FutureImpl.await(FutureImpl.java:27
    8)
    
    com.ibm.ws.xs.xio.actor.impl.FutureImpl.get(FutureImpl.java:310)
    
    com.ibm.ws.objectgrid.container.xio.XIORemoteObjectGridContainer
    Impl._non_existent(XIORemoteObjectGridContainerImpl.java:139)
    
    com.ibm.ws.objectgrid.replication.PrimaryShardImpl.updateMasterC
    ontainerRefs(PrimaryShardImpl.java:6546)
    
    com.ibm.ws.objectgrid.replication.XIOIDLReplicatedPartition.proc
    essContainerRefs(XIOIDLReplicatedPartition.java:306)
    
    com.ibm.ws.objectgrid.server.container.ContainerActor.doWorkRece
    ive(ContainerActor.java:303)
    
    com.ibm.ws.objectgrid.server.container.ContainerActor.receive(Co
    ntainerActor.java:180)
    
    com.ibm.ws.xs.xio.actor.impl.XIOReferableImpl.dispatch(XIORefera
    bleImpl.java:110)
    
    com.ibm.ws.xsspi.xio.actor.XIORegistry.sendToTarget(XIORegistry.
    java:981)
    
    com.ibm.ws.xs.xio.transport.channel.XIORegistryRunnable.run(XIOR
    egistryRunnable.java:88)
    
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec
    utor.java:1176)
    
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe
    cutor.java:641)
    
    com.ibm.ws.objectgrid.thread.XSThreadPool$Worker.run(XSThreadPoo
    l.java:309)
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  WebSphere eXtreme Scale users running XIO.  *
    *                                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION: Primary shard promotion or new          *
    *                      replicas are delayed due to hung        *
    *                      threads or MessageTimeOutExceptions.    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Primary shard promotion or new replicas are delayed due to hung
    threads or MessageTimeOutExceptions because the incoming
    placement work (ContainerActor.doWorkReceive) tries to ping
    remote XIO references in the work proactively. If there was a
    recent failure (such as a network issue or a machine failed),
    the remote call can time out. This prevents other incoming
    placement work from completing and might delay shard
    promotions or the addition of new replicas; for example:
    4/20/15 13:57:01:449 JST] 000000a1 XSThreadPool  W
    CWOBJ7853W:
    Detected a hung thread named "XIOPrimaryPool : 0" TID:c2
    WAITING.  Executing since 4/20/2015 13:56:36:169 +0900.
    Stack Trace:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:197
    )
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
    bject.await(AbstractQueuedSynchronizer.java:2054)
    com.ibm.ws.xs.xio.actor.impl.FutureImpl.await(FutureImpl.java:27
    8)
    com.ibm.ws.xs.xio.actor.impl.FutureImpl.get(FutureImpl.java:310)
    com.ibm.ws.objectgrid.container.xio.XIORemoteObjectGridContainer
    Impl._non_existent(XIORemoteObjectGridContainerImpl.java:139)
    com.ibm.ws.objectgrid.replication.PrimaryShardImpl.updateMasterC
    ontainerRefs(PrimaryShardImpl.java:6546)
    com.ibm.ws.objectgrid.replication.XIOIDLReplicatedPartition.proc
    essContainerRefs(XIOIDLReplicatedPartition.java:306)
    com.ibm.ws.objectgrid.server.container.ContainerActor.doWorkRece
    ive(ContainerActor.java:303)
    com.ibm.ws.objectgrid.server.container.ContainerActor.receive(Co
    ntainerActor.java:180)
    com.ibm.ws.xs.xio.actor.impl.XIOReferableImpl.dispatch(XIORefera
    bleImpl.java:110)
    com.ibm.ws.xsspi.xio.actor.XIORegistry.sendToTarget(XIORegistry.
    java:981)
    com.ibm.ws.xs.xio.transport.channel.XIORegistryRunnable.run(XIOR
    egistryRunnable.java:88)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec
    utor.java:1176)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe
    cutor.java:641)
    com.ibm.ws.objectgrid.thread.XSThreadPool$Worker.run(XSThreadPoo
    l.java:309)
    

Problem conclusion

  • The proactive ping was removed. Each placement work deals with
    any failures individually to avoid a bottleneck.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI40223

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    860

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-05-01

  • Closed date

    2015-05-28

  • Last modified date

    2015-05-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
28 May 2015