A fix is available
APAR status
Closed as program error.
Error description
Compute Grid V8.0 jobs stopped running after customer recycled their "batch" cluster for the Database Config issues. Also, Compute Grid 8.0 tends to wipe out joblog messages stating that a given job cannot be dispatched.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: Users of the Java batch function in IBM * * WebSphere Application Server V8.5 * **************************************************************** * PROBLEM DESCRIPTION: Issues re-establishing communications * * between ComputeGrid/batch scheduler * * server and endpoint server(s) after a * * scheduler or endpoint is recycled. * * For example, after an endpoint is * * quiesced - resulting in jobs not * * getting dispatched (stuck in * * submitted state). * **************************************************************** * RECOMMENDATION: * **************************************************************** The Java batch architecture uses a single scheduler server to dispatch work to a number of endpoint servers hosting the batch application, with the scheduler establishing communication with the endpoint servers using a "heart beat" mechanism. There was a timing window where communication with a particular endpoint server was not getting reestablished in the case that the endpoint was recycled, as well as in the case where the endpoint(s) remained active while the scheduler was recycled. There was also a bug such that if an endpoint was quiesced, then recycled, communication with that particular endpoint wasn't reestablished correctly when the endpoint came back up. In both cases, you can experience the symptom of jobs appearing to be "stuck in submitted state", that is, not getting dispatched to the appropriate endpoint. It is also possible that a given cluster member does not get any jobs dispatched to it while other cluster member(s) receive the job dispatches.
Problem conclusion
The quiesce bug was fixed and the timing window closed so that endpoint and scheduler servers can be recycled with dispatch resuming normally once both are up and running. The fix for this APAR is currently targeted for inclusion in fix pack 8.5.0.2. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PM71892
Reported component name
WEBS APP SERV N
Reported component ID
5724H8800
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-08-30
Closed date
2012-12-18
Last modified date
2012-12-18
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WEBS APP SERV N
Fixed component ID
5724H8800
Applicable component levels
R850 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
01 November 2021