APAR status
Closed as program error.
Error description
In large topologies, the catalog server experiences hung threads during recovery after the process to create replica shards times out.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: WebSphere eXtreme Scale users in large * * topologies. * **************************************************************** * PROBLEM DESCRIPTION: The catalog server experiences hung * * threads trying to do recovery after * * replica shard creation times out. * **************************************************************** * RECOMMENDATION: * **************************************************************** A recovery path (PlacementServiceImpl.removeShard) was incorrectly called when replica creation timed out with NO_RESPONSE or MessageTimeOutException exceptions. In large topologies, the high volume of calls causes a bottleneck on the catalog server, and the hung threads are detected. See the following example message from a catalog server JVM log or SystemOut.log file: ThreadMonitor W WSVR0605W: Thread "ORB.thread.pool : 177" (0000102e) has been active for 784792 milliseconds and may be hung. There is/are 1326 thread(s) in total in the server that may be hung. at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156 ) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndChe ckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQue ued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ab stractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock (ReentrantReadWriteLock.java:807) at com.ibm.ws.objectgrid.locks.RWLock16.startWriting(RWLock16.java: 79) at com.ibm.ws.objectgrid.catalog.placement.PlacementServiceImpl.rem oveShard(PlacementServiceImpl.java:1972)
Problem conclusion
Apply the interim fix for catalog servers to recover after the hung threads resolve.
Temporary fix
Comments
APAR Information
APAR number
PI11006
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-02-04
Closed date
2014-02-06
Last modified date
2014-02-06
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702
Applicable component levels
R850 PSY
UP
R860 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
06 February 2014