IBM Support

PM29179: HAMANAGER BULLETIN BOARD POST IS DROPPED DURING A REBUILD OF ODC TREE

Fixes are available

7.0.0.19: WebSphere Application Server V7.0 Fix Pack 19
7.0.0.21: WebSphere Application Server V7.0 Fix Pack 21
7.0.0.23: WebSphere Application Server V7.0 Fix Pack 23
7.0.0.25: WebSphere Application Server V7.0 Fix Pack 25
7.0.0.27: WebSphere Application Server V7.0 Fix Pack 27
7.0.0.29: WebSphere Application Server V7.0 Fix Pack 29
6.1.0.47: WebSphere Application Server V6.1 Fix Pack 47
7.0.0.31: WebSphere Application Server V7.0 Fix Pack 31
7.0.0.27: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.33: WebSphere Application Server V7.0 Fix Pack 33
7.0.0.35: WebSphere Application Server V7.0 Fix Pack 35
7.0.0.37: WebSphere Application Server V7.0 Fix Pack 37
7.0.0.39: WebSphere Application Server V7.0 Fix Pack 39
7.0.0.41: WebSphere Application Server V7.0 Fix Pack 41
7.0.0.43: WebSphere Application Server V7.0 Fix Pack 43
7.0.0.45: WebSphere Application Server V7.0 Fix Pack 45
6.1.0.39: Java SDK 1.5 SR12 FP4 Cumulative Fix for WebSphere Application Server
6.1.0.41: Java SDK 1.5 SR12 FP5 Cumulative Fix for WebSphere Application Server
6.1.0.43: Java SDK 1.5 SR13 Cumulative Fix for WebSphere Application Server
6.1.0.45: Java SDK 1.5 SR14 Cumulative Fix for WebSphere Application Server
6.1.0.47: Java SDK 1.5 SR16 Cumulative Fix for WebSphere Application Server
7.0.0.19: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.21: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere
7.0.0.23: Java SDK 1.6 SR10 FP1 Cumulative Fix for WebSphere
7.0.0.25: Java SDK 1.6 SR11 Cumulative Fix for WebSphere Application Server
7.0.0.27: Java SDK 1.6 SR12 Cumulative Fix for WebSphere Application Server
7.0.0.29: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.45: Java SDK 1.6 SR16 FP60 Cumulative Fix for WebSphere Application Server
7.0.0.31: Java SDK 1.6 SR15 Cumulative Fix for WebSphere Application Server
7.0.0.35: Java SDK 1.6 SR16 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.37: Java SDK 1.6 SR16 FP3 Cumulative Fix for WebSphere Application Server
7.0.0.39: Java SDK 1.6 SR16 FP7 Cumulative Fix for WebSphere Application Server
7.0.0.41: Java SDK 1.6 SR16 FP20 Cumulative Fix for WebSphere Application Server
7.0.0.43: Java SDK 1.6 SR16 FP41 Cumulative Fix for WebSphere Application Server

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • This is a problem that may happen in a complex high
    availability (HA) manager environment -- this problem was
    discovered in an environment with the following configuration:
    a cell with > 500 appservers which are divided among 250
    clusters, and the cell is divided into 8 coregroups.
    .
    The failure scenario is as follows:
    1. one AppServer, server1, encountered and OutOfMemoryError due
    to a memory leak in an application.  It became unresponsive to
    its node agent, so the node agent restarted server1.
    .
    2. the other AppServer, server2, under the same node agent also
    encountered an OutOfMemoryError, but it was still responsive, so
    it did not get restarted by the node agent.
    .
    3. Under a different node agent, the AppServer server3 was
    restarted.  server3 does not finish restarting, it is prevented
    from doing so due to tx recovery; it logs the CWRLS0030W message
    continually.
    .
    4. The cause of the CWRLS0030W was on server2.  This
    AppServer was stopped and restarted using the
    stopServer and startServer command line commands.  It came up
    fine, and server3 also finished starting.
    .
    5. For unknown reason the hung server, server2
    seems not only to influence server3, but also all members of
    its coregroup, CoreGroup1, are impacted negatively.
    .
    6. There is a DataPower system also in the cell.  The problem
    for which this APAR is being created is that incorrect
    ODCTree status information is propagated from
    CoreGroup1 to the DefaultCoreGroup.  The incorrect
    ODCInfo indicated that the status of server2's cluster
    incorrectly.
    .
    Once the servers in the cluster that were reported down when
    they were not were restarted, the proper state information was
    reported in the ODCInfo.
    .
    The symptom one might see without DataPower is DCSV8030I that
    reports the wrong status for a server.  The reason for this is
    that the bulletin board, which is supposed to contain status
    messages has dropped a message, so the wrong information is
    reported.
    

Local fix

  • Restart all the nodes or servers whose status is reported
    incorrectly.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server                                      *
    ****************************************************************
    * PROBLEM DESCRIPTION: Work ceases being routed to a           *
    *                      server after a coregroup bridge is      *
    *                      restarted.                              *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    After a coregroup bridge rebuild, bulletin board subscriber
    updates intermittently drop posts from running servers. These
    dropped posts are interpreted by work load management to mean
    that these servers are no longer routeable.
    

Problem conclusion

  • The logic for sending bulletin board updates in the active
    coordinator was modified such that the entire state of
    subscribed to subjects is sent at the conclusion of bridge
    rebuild periods.
    
    The fix for this APAR is currently targeted for inclusion in
    fix packs 7.0.0.19 and 6.1.0.39.  Please refer to the
    Recommended Updates page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM29179

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    61I

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-12-17

  • Closed date

    2011-03-23

  • Last modified date

    2011-03-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R60A PSY

       UP

  • R60H PSY

       UP

  • R60I PSY

       UP

  • R60P PSY

       UP

  • R60S PSY

       UP

  • R60W PSY

       UP

  • R60Z PSY

       UP

  • R61A PSY

       UP

  • R61H PSY

       UP

  • R61I PSY

       UP

  • R61P PSY

       UP

  • R61S PSY

       UP

  • R61W PSY

       UP

  • R61Z PSY

       UP

  • R700 PSY

       UP



Document information

More support for: WebSphere Application Server
General

Software version: 6.1

Reference #: PM29179

Modified date: 23 March 2011