APAR status
Closed as program error.
Error description
Fail fast clients have delayed recovery after failover
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: WebSphere eXtreme Scale users who are * * running * * with fast fail clients that have catalog * * servers and container servers failing at * * the * * same time. * **************************************************************** * PROBLEM DESCRIPTION: During a failover where one or more * * catalog servers fail at the same time * * as one or more container servers, * * fast fail clients take 30 seconds or * * more to recover. * **************************************************************** * RECOMMENDATION: * **************************************************************** A fast fail client has a very short or no requestRetryTimeout property time defined in the client properties or on the session. Therefore, the client does not retry the same request after a failure to route to the server. When catalog servers and container servers fail at the same time, the client-side code waits for a new list of catalog server endpoints before trying to request new routing information from the catalog server. This action normally prevents the client from calling a failed catalog server, which can result in longer recovery times. The recovery seems to be delayed even if there is a valid catalog server to contact. The WebSphere eXtreme Scale client logs show route table updates after receiving catalog server bootstrap updates. For example: [8/21/13 9:52:47:462 EDT] 00000061 LocationServi I CWOBJ2521I: The catalog server bootstrap addresses changed from host1:4809,host2:4809 to host1:4809. [8/21/13 9:52:47:476 EDT] 00000038 ClusterStore I CWOBJ1132I: An updated routing entry for domain:grid:epoch domain3:GridC:1377093141770 was obtained from the catalog server. [8/21/13 9:52:48:038 EDT] 00000061 LocationServi I CWOBJ2521I: The catalog server bootstrap addresses changed from host5:3809,host6:3809 to host5:3809. [8/21/13 9:52:48:103 EDT] 00000038 ClusterStore I CWOBJ1132I: An updated routing entry for domain:grid:epoch domain2:GridB:1377093142734 was obtained from the catalog server.
Problem conclusion
Apply ifix for better fast fail client recovery after a failure.
Temporary fix
Comments
APAR Information
APAR number
PM98008
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-09-27
Closed date
2013-10-14
Last modified date
2013-10-14
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702
Applicable component levels
R860 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
14 October 2013