Fixes are available
8.0.0.3: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 3
8.5.5.2: WebSphere Application Server V8.5.5 Fix Pack 2
8.0.0.4: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 4
8.5.5.3: WebSphere Application Server V8.5.5 Fix Pack 3
8.5.5.4: WebSphere Application Server V8.5.5 Fix Pack 4
8.5.5.5: WebSphere Application Server V8.5.5 Fix Pack 5
8.5.5.6: WebSphere Application Server V8.5.5 Fix Pack 6
8.5.5.7: WebSphere Application Server V8.5.5 Fix Pack 7
8.5.5.8: WebSphere Application Server V8.5.5 Fix Pack 8
8.5.5.9: WebSphere Application Server V8.5.5 Fix Pack 9
8.5.5.10: WebSphere Application Server V8.5.5 Fix Pack 10
8.0.0.5: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 5
8.5.5.11: WebSphere Application Server V8.5.5 Fix Pack 11
8.5.5.12: WebSphere Application Server V8.5.5 Fix Pack 12
8.5.5.13: WebSphere Application Server V8.5.5 Fix Pack 13
8.5.5.14: WebSphere Application Server V8.5.5 Fix Pack 14
8.5.5.15: WebSphere Application Server V8.5.5 Fix Pack 15
8.5.5.14: WebSphere Application Server V8.5.5 Fix Pack 14
8.5.5.17: WebSphere Application Server V8.5.5 Fix Pack 17
8.5.5.20: WebSphere Application Server V8.5.5.20
8.5.5.18: WebSphere Application Server V8.5.5 Fix Pack 18
8.5.5.19: WebSphere Application Server V8.5.5 Fix Pack 19
8.5.5.16: WebSphere Application Server V8.5.5 Fix Pack 16
8.5.5.21: WebSphere Application Server V8.5.5.21
APAR status
Closed as program error.
Error description
Excessive loading of job status objects by scheduler leads to OutOfMemoryError and other issues upon endpoint server restart.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere Extended Deployment * * Compute Grid. * **************************************************************** * PROBLEM DESCRIPTION: Excessive heap memory * * consumed in the scheduler especially * * when a CG endpoint server is started * * or restarted, possibly leading to an * * OutOfMemoryError. * * Also there is a timing issue flowing * * job logs back to wsgrid clients that * * can lead to a problem dispatching a * * job as well as reporting * * its correct status. * **************************************************************** * RECOMMENDATION: * **************************************************************** When an endpoint server is started or restarted, the scheduler server queries its internal database tables for information about job executions associated with the endpoint being started. This query was unnecessarily loading too much into memory, and in cases where there were a large number of completed jobs in the tables, this could lead to excessive resource usage, processing slowdown, and even OutOfMemoryError. For the second problem, a job coming in from the WSGrid interface could result in dispatch not working properly if a timing window was hit. The problem also involved the mechanism by which the scheduler requests job log parts from the dispatch endpoint to stream back to the WSGrid client. This could lead to the job not getting dispatch as well as the job status not being maintained correctly, possibly leading to the job appearing to be "stuck" in submitted state rather than moving to the restartable state upon failure. This would tend to happen if say there were a slowdown (e.g. a GC cycle) on the endpoint server to which the job was dispatched right after the dispatch had been initiated by the scheduler.
Problem conclusion
The database queries upon endpoint start were refined to ignore completed jobs. For the second problem the WSGrid processing was modified to short-circuit the job log streaming from the endpoint as long as the job is still in "submitted" state. The fix for this APAR is currently targeted for inclusion in fixpack 8.0.0.3 Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?uid=swg27022998
Temporary fix
Comments
APAR Information
APAR number
PM70434
Reported component name
WXD COMPUTE GRI
Reported component ID
5725C9301
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-08-07
Closed date
2012-12-17
Last modified date
2012-12-17
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WXD COMPUTE GRI
Fixed component ID
5725C9301
Applicable component levels
R800 PSY
UP
Document Information
Modified date:
28 April 2022