IBM Support

JR47860: MISSING TASK TIMER EXECUTIONS, STUCK INSTANCES DUE TO QUEUE FULL CONDITION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer experiences the following exception, which then caused
    missing task timer executions and stuck instances.
    [8/31/13 4:01:08:261 EDT] 000068dc wle_ucaexcept E   CWLLG0203E:
    Undercover Agent job failed. Task 22289533 job details are:
    class=com.lombardisoftware.bpd.runtime.engine.quartz.DbNotificat
    ionBpdTa
    sk parameters=[605640;14267244]  Error: UOWManager transaction
    processing failed; nested exception is
    com.ibm.wsspi.uow.UOWException:
    javax.transaction.RollbackException
    [8/31/13 4:01:08:453 EDT] 000068e0 XATransaction E   J2CA0027E:
    An
    exception occurred while invoking prepare on an XA Resource
    Adapter from
    DataSource jms/com.ibm.lombardi/EventEmissionQueueFactory,
    within
    transaction ID {XidImpl: formatId(57415344), gtrid_length(36),
    bqual_length(54),
    data(00000140d363c82b00000001354b872ca132aa12e00dbc22cb3da98070a
    cb91ff5c
    e17fb00000140d363c82b00000001354b872ca132aa12e00dbc22cb3da98070a
    cb91ff5c
    e17fb000000010000000000000000000000000002)} :
    javax.transaction.xa.XAException: CWSIC8007E: An exception was
    caught
    from the remote server with Probe Id 3-013-0010. Exception:
    CWSIC2029E:
    This transaction cannot commit as an operation that was
    performed within
    the transaction boundary failed. The first operation that failed
    generated the following exception:
    com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException:
    CWSIK0025E: The destination LombardiEventEmitterInputQueue on
    messaging
    engine prod.Messaging.000-MONITOR.ProcessServerCell01.Bus is not
    available because the high limit for the number of messages for
    this
    destination has already been reached...
    

Local fix

  • n/a
    

Problem summary

  • The Event Manager is responsible for scheduling and driving the
    execution of work in the Process Server/Process Center such as
    invocation of Under Cover Agents (UCAs), execution of Business
    Process Definitions (BPDs), invocation of BPD system task
    implementations and BPD timer events. This is done through
    so-called Event Manager tasks. They represent jobs the Event
    Manager is responsible to schedule.
    In case of exceptional situations, such as the a "queue full
    condition" of the monitor event queue, re-execution of those
    jobs kicks in in order to try overcoming the exceptional
    situation. There is a re-execute-limit specified in the
    80EventManager.xml (or configured in 100Custom.xml) that
    determines the number of retries.
    Once that limit is reached, the respective Event Manager task is
    not retried anymore. This then could result in a BPD instance
    that is not continuing its navigation anymore - it hangs.
    PROBLEM DETAILED DESCRIPTION:
    The problem was observed when monitoring was enabled and the
    Process Server produced more monitor events than could be
    processed fast enough. Thus the respective queue filled up and
    resulted in a "queue full condition". Consequently the BPD
    transactions that tried to emit those monitor events failed and
    were rolled back. Respective Event Manager tasks were retried
    until reaching the re-execute-limit of the Event Manager. Upon
    reaching that limit, corresponding BPD instances were not
    navigated anymore.
    As a result many hanging process instances were observed that
    could only be recovered via the move token feature.
    

Problem conclusion

  • The problem is solved by enhancing the Event Manager such that
    Event Manager tasks that were retried until reaching the
    re-execute-limit can be resumed by administrative means once the
    exceptional situation is resolved.
    With this code fix in place when a task reaches the maximum
    limit When a work item has reached the maximum retry execution
    or the queue is full the items will be listed in the Event
    Manager console page and marked with a scheduled execution date
    of "2099-02-01." Due to localization it may also appear as
    "1/2/99".
    * With this APAR upon reaching the re-execute-limit, the
    respective Event Manager task is put on hold.
    * In addition, a task is created and assigned to the EM
    administrator (as specified via notify-error in the
    80EventManager.xml).
    Note: The system could be configured such that the EM
    administrator is notified about such tasks via e-mail. Thus the
    EM administrator could be notified about the exceptional
    situation.
    * To resume Event Manager tasks that are on hold, a
    administrative command is provided that allows replaying such
    Event Manager tasks so that they can be scheduled by the Event
    Manager again: BPMReplayOnHoldEMTasks.
      Parameters:
        getNumberOfTasks - retrieves the number of Event Manager
           tasks that are on hold
           Set this parameter to true if the BPMReplayOnHoldEMTasks
           command should retrieve the number of Event Manager tasks
           available for replay.
        maxNumberOfTasksToReplay - replays on-hold Event Manager
           tasks up to a maximum number specified
           Use this parameter to set an upper limit for the number
           of on-hold Event Manager tasks to be replayed.
        bpdInstanceId - replays on-hold Event Manager tasks for the
           BPD instance specified
           Specifies for which BPD instance on-hold Event Manager
           tasks should be replayed.
      Note that the parameters are mutually exclusive.
    To invoke BPMReplayOnHoldEMTasks you must start wsadmin and
    connect it to the process server or process center.
        E.g., wsadmin -conntype SOAP -port 4080 -host
    PC1.mycompany.com -user admin -password admin -lang jython
      Examples
        Query the number of available on-hold Event Manager tasks in
        the system:
           wsadmin>AdminTask.BPMReplayOnHoldEMTasks
             ('[-getNumberOfTasks true]')
           'The BPMReplayOnHoldEMTasks command found 20 on hold
             Event Manager Task(s) ready for replay.'
        Replay 13 on-hold Event Manager tasks:
           wsadmin>AdminTask.BPMReplayOnHoldEMTasks
             ('[-maxNumberOfTasksToReplay 13]')
           'The BPMReplayOnHoldEMTasks command replayed 13 on hold
             Event Manager Task(s).'
        Replay on-hold Event Manager tasks for BPD instance 49:
           wsadmin>AdminTask.BPMReplayOnHoldEMTasks
             ('[-bpdInstanceId 49]')
           'The BPMReplayOnHoldEMTasks command replayed 1 on hold
             Event Manager Task(s).'
        Replay all on-hold Event Manager tasks:
           wsadmin>AdminTask.BPMReplayOnHoldEMTasks();
           'The BPMReplayOnHoldEMTasks command replayed 20 on hold
             Event Manager Task(s).'
    Notes:
    - Before replaying on-hold Event Manager tasks, analyse the root
    cause that led to the on-hold Event Manager tasks. Replay
    on-hold Event Manager tasks after the root cause is identified
    and resolved.
    - When an Event Manager task is replayed, the associated
    notification task for the administrator is deleted.
    - If there is a large number of on-hold Event Manager tasks in
    the system, don't replay all Event Manager tasks at once.
    Start with replaying a chunk of 100 Event Manager tasks. Then
    replay a larger chunk. As long as the performance is
    satisfactory, keep increasing the chunk until all on-hold
    Event Manager tasks are replayed.
      Please note, that replaying too many on-hold Event Manager
    tasks in one chunk can create a lot of load on the system. In
    order to cope with this load, the system has to be tuned
    carefully.
      It is recommended to replay on-hold Event Manager tasks during
    times with low system load.
    FIX AVAILABILITY:
    iFix for 8.0.1.1 is available on Fix Central, search for APAR
    JR47860 at http://www.ibm.com/support/fixcentral/
    Fix is also targetted for inclusion in next fixpack for BPM
    8.0.1, BPM 8.5.0
    When obtaining any of the above fixes, be sure to download the
    accompanying readme, for itself, and any prerequisite fixes, and
    review them thorougly.
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR47860

  • Reported component name

    BPM ADVANCED

  • Reported component ID

    5725C9400

  • Reported release

    801

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-09-23

  • Closed date

    2013-12-09

  • Last modified date

    2015-02-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    BPM STANDARD

  • Fixed component ID

    5725C9500

Applicable component levels

  • R801 PSY

       UP

  • R850 PSY

       UP

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSFTDH","label":"IBM Business Process Manager Standard"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.1"}]

Document Information

Modified date:
07 October 2021