IBM Support

PM64400: HUNG THREAD DETECTION FOR ALARM THREADS.

Fixes are available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Occasionally, WebSphere Application Server code running on
    alarm threads gets into states where the work takes a long
    time to complete. If all the alarm threads get into this
    condition, no further alarm callbacks will occur until one of
    the alarm threads complete. The lack of timely callbacks to
    pending alarms can cause numerous problems within the
    server....DCS view instability, EJB cache growth,
    OutOfMemory... and so on.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server.                                     *
    ****************************************************************
    * PROBLEM DESCRIPTION: The application server lacks the        *
    *                      capability to monitor alarm threads.    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The application server will now monitor the activity of
    threads on which system alarms run. When a system alarm
    thread has been active longer than the time defined by the
    alarm thread monitor threshold, the application server logs
    the following warning in the system log. This message indicates
    the name of the thread that is not responding, the length of
    time that the thread has already been active, and the
    exception stack of the thread, which identifies the system
    component.
    UTLS0008W: The alarm thread threadname has been active for n
    milliseconds and may be hung. totalthreadsthreadstack
    In this message, threadname is the name that appears in a JVM
    thread dump, n is approximately how long the thread was
    active, totalthreads is an overall assessment of the system
    threads, and threadstack is the exception stack of the thread.
    If the alarm work eventually completes, the following message
    is written to the system log. This message indicates the thread
    that produced the false alarm.
    UTLS0009W: Alarm Thread threadname was previously reported to
    be hung but has completed.  It was active for approximately
    n milliseconds.
    In this message, threadname is the name that appears in a JVM
    thread dump, and n is approximately how long the thread was
    active.
    Typically, system alarms do not process heavy loads because
    such activity might delay the processing of other
    scheduled system alarms, which in turn might impact server
    behavior. The UTLS0008W message is intended to help IBM
    Support personnel investigate problems potentially caused by
    system alarm behavior.
    All of the system alarms share a common alarm thread pool. The
    properties which govern the monitoring of this thread pool can
    be tuned using the administrative console. You can reduce the
    frequency at which WebSphere Application Server generates alarm
    hung thread messages by adjusting the alarm thread monitor
    check interval or threshold.
    If you want to monitor the activity of threads on which system
    alarms run, add the following JVM generic arguments to the
    server settings.  Notice that the system alarm thread monitor
    is enabled by default.
    Name
    -Dcom.ibm.websphere.alarmthreadmonitor.enabled
    Value
    Set to true or false to enable the system alarm thread
    monitor.
    Default
    True
    Name
    -Dcom.ibm.websphere.alarmthreadmonitor.generate.javacore
    Value
    Set to any value to cause a javacore dump to be created
    when an hung system alarm thread is detected. The threads
    section of the javacore dump can be analyzed to determine
    what the reported thread and other related threads are doing.
    Default
    Unset
    Name
    -Dcom.ibm.websphere.alarmthreadmonitor.checkinterval.millis
    Value
    The frequency, in milliseconds, at which system alarm
    threads are interrogated. Set the value to zero to disable
    system alarm hung thread detection. The maximum interval is
    600000 (10 minutes).
    Default
    10000 (10 seconds)
    Name
    -Dcom.ibm.websphere.alarmthreadmonitor.threshold.millis
    Value
    Set to any value integer between 10000 and 600000 (10
    minutes). This argument is used to specify the length of
    time, in milliseconds, that a system alarm thread can be
    active before it is considered non-responsive. Any system
    alarm thread that is detected as inactive for longer than
    this length of time is reported as hung.
    Default
    40000 (40 seconds)
    

Problem conclusion

  • Apply APAR PM64400 to enhance the serviceability of the
    application server by adding the capability to monitor alarm
    threads.
    
    The fix for this APAR is currently targeted for inclusion in
    fix pack 7.0.0.25.  Please refer to the Recommended Updates
    page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM64400

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    700

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-05-11

  • Closed date

    2012-06-20

  • Last modified date

    2012-06-20

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R700 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
28 October 2021