IBM Support

PI45880: After an unexpected timeout, old primary shards do not stop.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When the promoting primary shard gets a timeout exception such a
    MessageTimeOutException communicating with the previous primary
    shard, but the previous primary shard is still available, the pr
    will start routing to the new promoted primary shard, but the pr
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  WebSphere eXtreme Scale users who           *
    *                  experience intermittent                     *
    *                  MessageTimeOutException exceptions and      *
    *                  see extra primary shards listed in the      *
    *                  xscmd showMapSizes output.                  *
    ****************************************************************
    * PROBLEM DESCRIPTION: After an unexpected timeout, old        *
    *                      primary shards are not stopped.         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    If a promoted primary shard gets a MessageTimeOutException
    communicating with the previous primary shard, but the previous
    primary shard is still running and available, the previous
    primary shard can be left running instead of being stopped and
    removed. The extra primary shard does not appear in the xscmd
    route table output and clients will route to the new primary
    shard. However, it can appear as though an extra shard is in
    the xscmd showMapSizes output.
    FFDCs logged by ReplicatedPartition.becomePrimary occur.
    

Problem conclusion

  • If the initial attempt to stop the previous primary shard fails,
    then the stop request is retried after an interval.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI45880

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    860

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-07-29

  • Closed date

    2015-07-30

  • Last modified date

    2015-07-30

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
30 July 2015