APAR status
Closed as program error.
Error description
A timing issue exists where the catalog cluster HAM/DCS view is at quorum level. However, the catalog static replication code is not fully wired before a primary shard, in the process of being promoted, attempts to write to the balance data grid. That write fails with a ReplicationVotedToRollbackTransactionException. You see an FFDC similar to the following example: FFDC Exception:com.ibm.ws.xsspi.xio.exception.TransportException$Inte rnal SourceId:com.ibm.ws.objectgrid.server.catalog.placement.CatalogS erviceCommon.activate ProbeId:1547 Reporter:com.ibm.ws.objectgrid.server.catalog.placement.CatalogS erviceCommon@1114fc5 com.ibm.ws.xsspi.xio.exception.TransportException$Internal [originating=10.193.8.92:10001;exid=0]: CWOBJ1688E: Unable to bind OBJECTGRID_PLACEMENT_SERVICE: rolling back transaction, see caused by exception at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCons tructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delega tingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at com.ibm.ws.xsspi.xio.exception.XIOExceptionFactory.createExcepti on(XIOExceptionFactory.java:58) at com.ibm.ws.objectgrid.server.naming.CommonLocationService.setPla cementService(CommonLocationService.java:671) at com.ibm.ws.objectgrid.server.naming.XIOLocationService.setPlacem entService(XIOLocationService.java:375) at com.ibm.ws.objectgrid.server.catalog.placement.CatalogServiceCom mon.doActivate(CatalogServiceCommon.java:961) at com.ibm.ws.objectgrid.server.catalog.placement.CatalogServiceCom mon$1.run(CatalogServiceCommon.java:871) at java.lang.Thread.run(Thread.java:722) Caused by: com.ibm.websphere.objectgrid.TransactionException: rolling back transaction, see caused by exception at com.ibm.ws.objectgrid.SessionImpl.rollbackPMapChanges(SessionImp l.java:2548) at com.ibm.ws.objectgrid.SessionImpl.commit(SessionImpl.java:2160) at com.ibm.ws.objectgrid.server.naming.CommonLocationService.setPla cementService(CommonLocationService.java:653) ... 4 more Caused by: com.ibm.websphere.objectgrid.ReplicationVotedToRollbackTransacti onException: BalanceGrid:ENTITY_MAPSET:0: Only 0 replicas voted to commit the transaction. Number of replicas voting: 0. Minimum required to commit: 3. Domain: null. A possible reason is a LifecycleFailedException during shard activation, check server logs for CWOBJ1209 messages at com.ibm.ws.objectgrid.SessionImpl.commit(SessionImpl.java:2041) ... 5 more
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All WebSphere eXtreme Scale customers who * * use quorum. * **************************************************************** * PROBLEM DESCRIPTION: Problems occur when you restart * * catalog servers and quorum is enabled. * **************************************************************** * RECOMMENDATION: * **************************************************************** A timing issue exists where the catalog cluster HAM/DCS view is at quorum level. However, the catalog static replication code is not fully wired before a primary shard, in the process of being promoted, attempts to write to the balance data grid. That write fails with a ReplicationVotedToRollbackTransactionException. You see an FFDC similar to the following example: FFDC Exception:com.ibm.ws.xsspi.xio.exception.TransportException$Inte rnal SourceId:com.ibm.ws.objectgrid.server.catalog.placement.CatalogS erviceCommon.activate ProbeId:1547 Reporter:com.ibm.ws.objectgrid.server.catalog.placement.CatalogS erviceCommon@1114fc5 com.ibm.ws.xsspi.xio.exception.TransportException$Internal [originating=10.193.8.92:10001;exid=0]: CWOBJ1688E: Unable to bind OBJECTGRID_PLACEMENT_SERVICE: rolling back transaction, see caused by exception at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCons tructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delega tingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at com.ibm.ws.xsspi.xio.exception.XIOExceptionFactory.createExcepti on(XIOExceptionFactory.java:58) at com.ibm.ws.objectgrid.server.naming.CommonLocationService.setPla cementService(CommonLocationService.java:671) at com.ibm.ws.objectgrid.server.naming.XIOLocationService.setPlacem entService(XIOLocationService.java:375) at com.ibm.ws.objectgrid.server.catalog.placement.CatalogServiceCom mon.doActivate(CatalogServiceCommon.java:961) at com.ibm.ws.objectgrid.server.catalog.placement.CatalogServiceCom mon$1.run(CatalogServiceCommon.java:871) at java.lang.Thread.run(Thread.java:722) Caused by: com.ibm.websphere.objectgrid.TransactionException: rolling back transaction, see caused by exception at com.ibm.ws.objectgrid.SessionImpl.rollbackPMapChanges(SessionImp l.java:2548) at com.ibm.ws.objectgrid.SessionImpl.commit(SessionImpl.java:2160) at com.ibm.ws.objectgrid.server.naming.CommonLocationService.setPla cementService(CommonLocationService.java:653) ... 4 more Caused by: com.ibm.websphere.objectgrid.ReplicationVotedToRollbackTransacti onException: BalanceGrid:ENTITY_MAPSET:0: Only 0 replicas voted to commit the transaction. Number of replicas voting: 0. Minimum required to commit: 3. Domain: null. A possible reason is a LifecycleFailedException during shard activation, check server logs for CWOBJ1209 messages at com.ibm.ws.objectgrid.SessionImpl.commit(SessionImpl.java:2041) ... 5 more
Problem conclusion
A code fix was delivered to accommodate the timing window, and retry the operation.
Temporary fix
Comments
APAR Information
APAR number
PI18503
Reported component name
WS EXTREME SCAL
Reported component ID
5724X6702
Reported release
860
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-05-22
Closed date
2014-05-23
Last modified date
2014-05-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EXTREME SCAL
Fixed component ID
5724X6702
Applicable component levels
R850 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTVLU","label":"WebSphere eXtreme Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"860","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
23 May 2014