IBM Support

PI17596: Reconnecting containers might immediately stop healthy containers during container reconnect.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Reconnecting containers can abruptly stop healthy containers
    during container reconnect.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All WebSphere eXtreme Scale users running   *
    *                  embedded container and catalog servers in   *
    *                  WebSphere Application Server.               *
    ****************************************************************
    * PROBLEM DESCRIPTION: Containers on the primary catalog       *
    *                      server side of a brownout are           *
    *                      mistakenly being directed to perform    *
    *                      container reconnect.                    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    During container reconnect, stable containers can be torn down.
    For instance, during a brownout, the containers on the side
    with the primary catalog server should persist and not be torn
    down. However, those servers are torn down and reconnected
    after the island containers have finished reconnecting. This
    is due to a timing window that occurs when the island
    containers reconnect to the catalog server and have coalesced
    into their own group, but before they have coalesced into a
    group with the non-island servers. The island servers are able
    to send a heartbeat or view that does not contain the
    non-island servers to the catalog server. The catalog server
    mistakenly allowed the heartbeat to force the non-island
    servers to reconnect, since they were not yet in the island
    server view.
    

Problem conclusion

  • The code was updated to remove the potential timing window.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI17596

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    860

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-05-09

  • Closed date

    2014-05-21

  • Last modified date

    2014-05-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R711 PSY

       UP

  • R850 PSY

       UP

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
21 May 2014