IBM Support

PI45242: Catalog servers can fail to start when clients are left running and all servers are stopped.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Catalog servers can fail to start with several different types
    of failures including OutOfMemoryError and CPU starvations.
    Also, a large increase in the number of threads used by the
    catalog server may be seen.  The threads have a stack similar
    to what is below:
    
    at
    com/ibm/ws/objectgrid/locks/Lock.acquire(Lock.java:131(Compiled
    Code))
    at com/ibm/ws/objectgrid/locks/Lock.lock(Lock.java:359(Compiled
    Code))
    at
    com/ibm/ws/objectgrid/locks/LockManager.lock(LockManager.java:25
    6(Compiled Code))
    at
    com/ibm/ws/objectgrid/map/BaseMap.getLock(BaseMap.java:7207(Comp
    iled Code))
    at
    com/ibm/ws/objectgrid/DiffMap.lookForKey(DiffMap.java:1382(Compi
    led Code))
    at com/ibm/ws/objectgrid/DiffMap.get(DiffMap.java:981(Compiled
    Code))
    at
    com/ibm/ws/objectgrid/ObjectMapImpl.get(ObjectMapImpl.java:451(C
    ompiled Code))
    at
    com/ibm/ws/objectgrid/ObjectMapImpl.get(ObjectMapImpl.java:411(C
    ompiled Code))
    at
    com/ibm/ws/objectgrid/server/naming/CommonLocationService.resolv
    e(CommonLocationService.java:754(Compiled Code))
    at
    com/ibm/ws/objectgrid/server/naming/CommonLocationService.resolv
    e(CommonLocationService.java:657(Compiled Code))
    at
    com/ibm/ws/objectgrid/server/naming/CommonLocationService.getPla
    cementService(CommonLocationService.java:943(Compiled Code))
    at
    com/ibm/ws/objectgrid/server/naming/XIOLocationService.getPlacem
    entService(XIOLocationService.java:298(Compiled Code))
    at
    com/ibm/ws/objectgrid/server/catalog/placement/ReadOnlyCatalogSe
    rviceImpl$1.run(ReadOnlyCatalogServiceImpl.java:560(Compiled
    Code))
    at java/lang/Thread.run(Thread.java:857(Compiled Code))
    

Local fix

  • Do not restart all the catalog servers at once and avoid having
    the container down at the same time.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  Users of eXtreme Scale who leave clients    *
    *                  running and restart the entire grid         *
    *                  including catalog servers. Users who use    *
    *                  near cache invalidation may see this        *
    *                  problem more prevalently.                   *
    ****************************************************************
    * PROBLEM DESCRIPTION: When clients contact a catalog server   *
    *                      before the startup process is           *
    *                      complete, it can fail to complete the   *
    *                      process.                                *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The catalog server function was not correctly short circuiting
    some function which caused the catalog server to be
    overwhelmed.  The short circuit logic was updated to prevent
    the catlaog server from being overwhelmed with client traffic
    during server startup.
    

Problem conclusion

  • An interim fix is available for this APAR upon request.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI45242

  • Reported component name

    WS EXTREME SCAL

  • Reported component ID

    5724X6702

  • Reported release

    860

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-07-19

  • Closed date

    2015-09-29

  • Last modified date

    2015-09-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EXTREME SCAL

  • Fixed component ID

    5724X6702

Applicable component levels

  • R860 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
29 September 2015