PI17596: Reconnecting containers might immediately stop healthy containers during container reconnect.

APAR status

Closed as program error.

Error description

Reconnecting containers can abruptly stop healthy containers
during container reconnect.

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:  All WebSphere eXtreme Scale users running   *
*                  embedded container and catalog servers in   *
*                  WebSphere Application Server.               *
****************************************************************
* PROBLEM DESCRIPTION: Containers on the primary catalog       *
*                      server side of a brownout are           *
*                      mistakenly being directed to perform    *
*                      container reconnect.                    *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
During container reconnect, stable containers can be torn down.
For instance, during a brownout, the containers on the side
with the primary catalog server should persist and not be torn
down. However, those servers are torn down and reconnected
after the island containers have finished reconnecting. This
is due to a timing window that occurs when the island
containers reconnect to the catalog server and have coalesced
into their own group, but before they have coalesced into a
group with the non-island servers. The island servers are able
to send a heartbeat or view that does not contain the
non-island servers to the catalog server. The catalog server
mistakenly allowed the heartbeat to force the non-island
servers to reconnect, since they were not yet in the island
server view.

Problem conclusion

The code was updated to remove the potential timing window.

Temporary fix

Comments

APAR Information

APAR number
PI17596
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
860
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-05-09
Closed date
2014-05-21
Last modified date
2014-05-21

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702

Applicable component levels

R711 PSY
UP
R850 PSY
UP
R860 PSY
UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
21 May 2014

Tips

PI17596: Reconnecting containers might immediately stop healthy containers during container reconnect.

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R711 PSY

R850 PSY

R860 PSY

Document Information

Share your feedback

Need support?