APAR status
Closed as program error.
Error description
After repeated network failures and recovery, stale placement wo Symptoms of this problem include: A CWOBJ1524 listing "Replica was disconnected from primary on co Or a primary is demoted and not promoted to either a replica. Th [10/1/15 11:05:29:560 EDT] 000000d4 PrimaryShardI I CWOBJ1547I [10/1/15 11:05:29:560 EDT] 000000d4 PrimaryShardI I CWOBJ1575I
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: WebSphere eXtreme Scale users experiencing * * frequent network failures in a short amount * * of * * time where the next failure occurs before * * placement and replication completes from * * the * * prior recovery. * **************************************************************** * PROBLEM DESCRIPTION: Repeated network failures cause primary * * shard demotions and * * TargetNotAvailableExceptions * **************************************************************** * RECOMMENDATION: * **************************************************************** After repeated network failures and recovery, stale placement work can occur and cause incorrect shard movements. Including the demotion of a primary shard or the recycling of a primary shard. This can lead to TargetNotAvailableException or loss of data. Symptoms of this problem include: A CWOBJ1524 listing "Replica was disconnected from primary on containerName for an unknown length of time and must be reregistered to restart replication" as the reason to re- register on a shard that is primary. The CWOBJ1524 happens as a stale request from a primary shard running on the server experiencing network problems. Or a primary is demoted and not promoted to either a replica. The demotion occurs by a stale primary on the server experiencing network problems. In the following example, container1 would be the container experiencing intermittent network problems.: [10/1/15 11:05:29:560 EDT] 000000d4 PrimaryShardI I CWOBJ1547I: PLATFORM:PLATFORM_MAPSET:9 (demoting primary to inactive) in transition. [10/1/15 11:05:29:560 EDT] 000000d4 PrimaryShardI I CWOBJ1575I: Request to demote primary (PLATFORM:PLATFORM_MAPSET:9) originated from container container1.
Problem conclusion
Stale placement work was blocked. If the network fails and recovers repeatedly and more quickly than placement replication can complete, extra shards can remain after recovery. If the ext shards persist, they can be resolved using the xscmd command, triggerPlacement with the -removeExtra option.
Temporary fix
Comments
APAR Information
APAR number
PI50551
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
860
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2015-10-14
Closed date
2015-11-05
Last modified date
2015-11-05
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702
Applicable component levels
R860 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
05 November 2015