IBM Support

PK64003: High Availability Manager support for Transparent bridge failover.

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • HTTP404 (Unroutable Server) occurs during core group
    bridge rebuild periods.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  IBM WebSphere Application Server            *
    *                  V6.0.2 and V6.1 users of the core group     *
    *                  bridge                                      *
    ****************************************************************
    * PROBLEM DESCRIPTION: Core group data is lost during core     *
    *                      group bridge failover.                  *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    During core group bridge failover, some bulletin board data is
    unavailable in local and remote core groups until the
    remaining bridges can recover the data.  This can result in
    404s for the WebSphere Proxy or WebSphere Extended Deployment's
    On-Demand Router when routing to endpoints in non-local core
    groups during bridge failover.
    

Problem conclusion

  • Core group bridge failover will no longer result in missing
    bulletin board data when the custom property
    "IBM_CS_HAM_PROTOCOL_VERSION=6.0.2.31" is set on every core
    group of an access point group.
    
    The fix for this APAR is currently targeted for inclusion in
    fix packs 6.1.0.19 and 6.0.2.31.  Please refer to the
    Recommended Updates page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    
    Internal WebSphere Application Server components, such as work
    load management (WLM), and the On Demand Routing features
    of WebSphere Virtual Enterprise, depend on cross core group
    state to perform their product functions. Bridges provide
    the mechanism that is used to represent and manage the cross
    core group state used by these internal users. Part of the
    management of this cross core group state is to perform bridge
    state rebuilds whenever there is a change in the number
    of running bridges in a topology. The bridge state rebuild is
    the means by which bridges calculate the ownership and
    distribution of the cross-core group state among the running
    set of bridges. During bridge state rebuilds, cross-core
    group state can be moved between running bridges. This
    situation might cause the data to be temporarily unavailable
    until the bridge has completed the rebuild process.
    
    The common symptoms of this problem are:
    
    1. JNDI lookups failing.
    2. WebSphere proxy server or On Demand Router generating 503
    response codes immediately after a core group bridge has been
    started or stopped.
    3. CORBA exceptions immediately after a core group bridge has
    been started or stopped.
    4. The occurrence of the following
    ArrayIndexOutOfBoundsException:
    [7/9/08 17:12:20:749 EDT] 00000030 UserCallbacks E
    HMGR0142E: An error occurred in a component called back by
    the High Availability Manager. The exception is
    java.lang.ArrayIndexOutOfBoundsException  at
    com.ibm.ws.cluster.propagation.bulletinboard.BBDescriptionManage
    r.getOrderedBytes(BBDescriptionManager.java:618)
    
    To avoid the temporary application outage that might occur
    during core group bridge failover, make sure that you are
    running the latest HAM protocol version. This requires that:
    
    - All 6.0.2 processes are running on 6.0.2.31 or later.
    
    - All 6.1 processes are running on 6.1.0.19 or later.
    
    - All 7.0 processes are running on 7.0.0.1 or later.
    
    - The core group custom property IBM_CS_HAM_PROTOCOL_VERSION
    has been set to 6.0.2.31 on all of your core groups.
    
    If you are not running the latest high availability manager
    protocol version, complete the following steps to activate
    the latest high availability manager protocol:
    
    1) Ensure that your installations are running at the required
    service levels.
    
    2) Determine if the high availability manager is configured to
    use preferred coordinator servers. If the high availability
    manager is not configured to use preferred coordinator servers,
    you must manually determine which servers are currently
    acting as preferred coordinator servers.
    
    3) Shut down all core group bridges and all preferred
    coordinator servers. The high availability manager will
    immediately select new coordinators to replace those that you
    shut down, but that scenario does not cause any problems.
    
    4) Repeat the following actions for each core group in your
    cells.
    
    a)In the administrative console, click Servers > Core Groups
    > Core group settings > CORE_GROUP_NAME > Custom properties
    
    b)Specify IBM_CS_HAM_PROTOCOL_VERSION in the Name field
    and 6.0.2.31 in the Value field.
    
    c)Save your changes
    
    5) Synchronize the configuration across the topology.
    
    6) Restart all of the preferred coordinator servers. The
    coordinator servers must complete the startup process before
    you go on to the next step.
    
    7) Restart all core group bridges in the topology.
    
    The topology is now using the 6.0.2.31 protocol.
    
    Other considerations when configuring your core group bridges:
    
    1.All of the servers in a core group must be at a service
    level that supports the 6.0.2.31 high availability manager
    protocol (IBM_CS_HAM_PROTOCOL_VERSION=6.0.2.31). If a core
    group contains servers that are at earlier service levels,
    these servers should be put into separate core groups. These
    core groups can then be bridged to the core groups that
    support the new high availability manager protocol because
    core group bridges can still communicate with each other
    even if they are using different core group protocols.
    However, the bridges for the core groups that are using the
    high availability protocol will not be able to fully leverage
    the transparent failover support that the high availability
    manager protocol provides because these bridges have to
    communicate with the bridges in the back-level core groups.
    Therefore it is recommended that you upgrade the back-level
    core groups to a service level that supports the new high
    availability manager protocol if possible.
    
    2.Transparent bridge failover is designed to hold state data
    constant during core group bridge rebuilds along the state
    data path, which is the path that consists of the state
    provider, one core group bridge in each respective core group,
    and a state data consumer. Failure scenarios that involve core
    groups without any remaining active bridges might still result
    in temporary state outages.
    
    3. Whenever a change is made in core group bridge
    configuration, including the addition of a new bridge, or the
    removal of an existing bridge, you must fully shutdown, and
    then restart all core group bridges in the affected access
    point groups.
    
    4. Always ensure that there is at least one running bridge in
    each core group. Configuring two bridges in each core group,
    allows for single failures, and periodic cycling of one
    bridge at a time from each core group. If all of the core group
    bridges in a core group are shutdown, core group state from
    all foreign core groups is lost.
    
    5. It is recommended that bridges be configured in their own
    dedicated server process, and that these processes have their
    monitoring policy set for automatic restart.
    
    6. It is recommended that you always set the
    IBM_CS_WIRE_FORMAT_VERSION core group custom property to the
    highest value that is supported on you environment.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PK64003

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    60A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2008-04-07

  • Closed date

    2008-07-25

  • Last modified date

    2008-12-12

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R60A PSY

       UP

  • R60H PSY

       UP

  • R60I PSY

       UP

  • R60P PSY

       UP

  • R60S PSY

       UP

  • R60W PSY

       UP

  • R60Z PSY

       UP

  • R61A PSY

       UP

  • R61H PSY

       UP

  • R61I PSY

       UP

  • R61P PSY

       UP

  • R61S PSY

       UP

  • R61W PSY

       UP

  • R61Z PSY

       UP



Document information

More support for: WebSphere Application Server
General

Software version: 6.0

Reference #: PK64003

Modified date: 12 December 2008