PI11006: Threads hang during recovery after the process to create replica shards times out.

APAR status

Closed as program error.

Error description

In large topologies, the catalog server experiences hung
threads during recovery after the process to create replica
shards times out.

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:  WebSphere eXtreme Scale users in large      *
*                  topologies.                                 *
****************************************************************
* PROBLEM DESCRIPTION: The catalog server experiences hung     *
*                      threads trying to do recovery after     *
*                      replica shard creation times out.       *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
A recovery path (PlacementServiceImpl.removeShard) was
incorrectly called when replica creation timed out with
NO_RESPONSE or MessageTimeOutException exceptions. In large
topologies, the high volume of calls causes a bottleneck on the
catalog server, and the hung threads are detected.
See the following example message from a catalog server JVM log
or SystemOut.log file:
ThreadMonitor W   WSVR0605W: Thread "ORB.thread.pool : 177"
(0000102e) has been active for 784792 milliseconds and may be
hung.  There is/are 1326 thread(s) in total in the server that
may be hung.
at sun.misc.Unsafe.park(Native Method)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:156
)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndChe
ckInterrupt(AbstractQueuedSynchronizer.java:811)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQue
ued(AbstractQueuedSynchronizer.java:842)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ab
stractQueuedSynchronizer.java:1178)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
(ReentrantReadWriteLock.java:807)
at
com.ibm.ws.objectgrid.locks.RWLock16.startWriting(RWLock16.java:
79)
at
com.ibm.ws.objectgrid.catalog.placement.PlacementServiceImpl.rem
oveShard(PlacementServiceImpl.java:1972)

Problem conclusion

Apply the interim fix for catalog servers to recover after the
hung threads resolve.

Temporary fix

Comments

APAR Information

APAR number
PI11006
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-02-04
Closed date
2014-02-06
Last modified date
2014-02-06

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702

Applicable component levels

R850 PSY
UP
R860 PSY
UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
06 February 2014

Tips

PI11006: Threads hang during recovery after the process to create replica shards times out.

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R850 PSY

R860 PSY

Document Information

Share your feedback

Need support?