IBM Support

PI11006: Threads hang during recovery after the process to create replica shards times out.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • In large topologies, the catalog server experiences hung
    threads during recovery after the process to create replica
    shards times out.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  WebSphere eXtreme Scale users in large      *
    *                  topologies.                                 *
    ****************************************************************
    * PROBLEM DESCRIPTION: The catalog server experiences hung     *
    *                      threads trying to do recovery after     *
    *                      replica shard creation times out.       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A recovery path (PlacementServiceImpl.removeShard) was
    incorrectly called when replica creation timed out with
    NO_RESPONSE or MessageTimeOutException exceptions. In large
    topologies, the high volume of calls causes a bottleneck on the
    catalog server, and the hung threads are detected.
    See the following example message from a catalog server JVM log
    or SystemOut.log file:
    ThreadMonitor W   WSVR0605W: Thread "ORB.thread.pool : 177"
    (0000102e) has been active for 784792 milliseconds and may be
    hung.  There is/are 1326 thread(s) in total in the server that
    may be hung.
    at sun.misc.Unsafe.park(Native Method)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:156
    )
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndChe
    ckInterrupt(AbstractQueuedSynchronizer.java:811)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQue
    ued(AbstractQueuedSynchronizer.java:842)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ab
    stractQueuedSynchronizer.java:1178)
    at
    java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
    (ReentrantReadWriteLock.java:807)
    at
    com.ibm.ws.objectgrid.locks.RWLock16.startWriting(RWLock16.java:
    79)
    at
    com.ibm.ws.objectgrid.catalog.placement.PlacementServiceImpl.rem
    oveShard(PlacementServiceImpl.java:1972)
    

Problem conclusion

  • Apply the interim fix for catalog servers to recover after the
    hung threads resolve.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI11006

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    850

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-02-04

  • Closed date

    2014-02-06

  • Last modified date

    2014-02-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R850 PSY

       UP

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
06 February 2014