IBM Support

PM71892: COMPUTE GRID JOB SCHEDULER FAILS TO RESUME DISPATCHING JOBS TO AN ENDPOINT SERVER THAT WAS QUIESCED BEFORE BEING RECYCLED.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Compute Grid V8.0 jobs stopped running after customer recycled
    their  "batch" cluster for  the Database Config issues. Also,
    Compute Grid 8.0 tends to wipe out joblog messages stating that
    a given job cannot be dispatched.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  Users of the Java batch function in IBM     *
    *                  WebSphere Application Server V8.5           *
    ****************************************************************
    * PROBLEM DESCRIPTION: Issues re-establishing communications   *
    *                      between ComputeGrid/batch scheduler     *
    *                      server and endpoint server(s) after a   *
    *                      scheduler or endpoint is recycled.      *
    *                      For example, after an endpoint is       *
    *                      quiesced - resulting in jobs not        *
    *                      getting dispatched (stuck in            *
    *                      submitted state).                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The Java batch architecture uses a single scheduler server to
    dispatch work to a number of endpoint servers hosting the
    batch application, with the scheduler establishing
    communication with the endpoint servers using a "heart beat"
    mechanism.
    There was a timing window where communication with a
    particular endpoint server was not getting reestablished in
    the case that the endpoint was recycled, as well as in the
    case where the endpoint(s) remained active while the scheduler
    was recycled.
    There was also a bug such that if an endpoint was quiesced,
    then recycled, communication with that particular endpoint
    wasn't reestablished correctly when the endpoint came back
    up.
    In both cases, you can experience the symptom of jobs
    appearing to be "stuck in submitted state", that is, not
    getting dispatched to the appropriate endpoint.  It is also
    possible that a given cluster member does not get any jobs
    dispatched to it while other cluster member(s) receive
    the job dispatches.
    

Problem conclusion

  • The quiesce bug was fixed and the timing window closed so
    that endpoint and scheduler servers can be recycled with
    dispatch resuming normally once both are up and running.
    
    The fix for this APAR is currently targeted for inclusion in
    fix pack 8.5.0.2. Please refer to the Recommended Updates page
    for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM71892

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    850

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-08-30

  • Closed date

    2012-12-18

  • Last modified date

    2012-12-18

  • APAR is sysrouted FROM one or more of the following:

    PM69782

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R850 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
01 November 2021