Recovery mechanism fails in WebSphere InterChange Server 4.3.0.6 after applying Limited Availability Interim Fix ( LAIF ) 7, 8, 9 or 10

Flash (Alert)


Abstract

Applying LAIF 7,8,9 or 10 on top of WebSphere InterChange Server 4.3.0.6 causes the recovery mechanism in WICS to fail.

Content

In 4.3.0.6 LAIF 7 an enhancement was added to the recovery API. This enhancement introduced a problem with the actual recovery mechanism in WICS.

Symptoms:
When WICS restarts after a crash or immediate shutdown, the controllers will go into recovery but will not process any new messages or the messages that were in-progress prior to crash. In the WICS startup logs, an exception is also thrown as below:


[Time: 2010/09/07 15:28:18.764] [System: Server] [Thread: Thread-14 (#2033372332)] [Mesg: _Recovery failed. Reason: java.lang.NullPointerException
at Connector.BusObjManager.deliverBusObj(BusObjManager.java:4770)
at Connector.MsgDrvInterface.gotBusObj(MsgDrvInterface.java:165)
at Connector.MsgDrvInterface.receiverCallback(MsgDrvInterface.java:141)
at CxCommon.WIPServices.BOMEventReader.enquePersistedEvent(BOMEventReader.java:135)
at CxCommon.WIPServices.BOMEventReader.recoverEvents(BOMEventReader.java:168)
at CxCommon.Messaging.CommonListener.deliverEventsToCallback(CommonListener.java:517)
at CxCommon.Messaging.CommonListener.deliverOutstandingEvents(CommonListener.java:454)
at CxCommon.Messaging.MQSeries.CxMQListener.recoverEvents(CxMQListener.java:503)
at CxCommon.Messaging.MQSeries.CxMQSession.recoverEvents(CxMQSession.java:121)
at Connector.MsgDrvInterface.recoverEvents(MsgDrvInterface.java:334
at Connector.BusObjManager.recoverEvents(BusObjManager.java:5454)
at Connector.BusObjManager.initDone(BusObjManager.java:1523)
at Connector.RecoveringControllerThread.run(RecoveringControllerThread.java:49)
at java.lang.Thread.run(Thread.java:570)

In certain cases, the exception is not stated but an error message is logged as below:

[Time: 2010/09/07 15:28:18.764] [System: Server] [Thread: Thread-14 (#2033372332)] [Type: Error] [MsgID: 191] [Mesg: Recovery failed. Reason .]

The controller will remain hanging in recovery. Any attempt to start/stop the controllers will not be accepted by the server. Instead you will see the following message:

[Time: 2010/09/07 15:31:57.358] [System: Server] [Thread: WT=1 (#2130054316)] [Type: Error] [MsgID: 14316] [Mesg: Failed to handle deactivate operation because the controller is performing recovery work.]

Resolution:
If you are using LAIF 7,8,9 or 10 on WICS 4.3.0.6, then you should back out the LAIF and apply LAIF 11.

Note: You can find the version of LAIF in WICS startup logs. You will see a message as below:
[Time: 2010/09/09 15:26:29.405] [System: Server] [Thread: main (#2018909695)] [Mesg: LAIF 10]

Please contact IBM Support to procure the LAIF 11.

Steps to back out the affected LAIF:
1) Shutdown WICS
2) Backup the jars that were installed as part of a particular LAIF.
3) Replace them with the jars present in LAIF 11.

The LAIF 11 includes all the APAR up until LAIF 10 and the additional APAR JR37740, which fixes the problem with recovery component.


Rate this page:

(0 users)Average rating

Document information


More support for:

WebSphere InterChange Server
ICS

Software version:

4.3.0.6

Operating system(s):

All Platforms

Reference #:

1447170

Modified date:

2010-09-17

Translate my page

Machine Translation

Content navigation