Some WMQ queue managers are unable to initialize within a Queue Sharing Group
A particular WebSphere MQ (WMQ) queue manager can not be started into a Queue Sharing Group (QSG) but does perhaps initialize outside of a QSG and/or on a different LPAR into a QSG.
Queue Manager start-up into a Queue Sharing Group (QSG) terminates with
CSQR002I ABCD RESTART COMPLETED
*CSQV086E ABCD QUEUE MANAGER ABNORMAL TERMINATION
IEA794I SVC DUMP HAS CAPTURED:
The queue manager had previously terminated abnormally.
Diagnosing the problem
Review the syslog and queue manager joblog messages. If the queue manager is able to initialize on a different LPAR into a QSG you will note IXL014I indicates a successful connection to CSQ_Admin and other structures.
In the abending scenario, a slip set on CSQV086E will provide output via VERBX LOGDATA. A check of the contents reveal an ABEND5C6 Reason 00C5101A generated by CSQEMTKN. This abend occurs if the failing queue manager's index value within the Peer Connection Table is invalid. An invalid value will prevent successful connection to the ADMIN list structure. An example of an invalid value is if this queue manager and another have been assigned the same queue manager id.
This is indicative that, at some time prior, there may have been a problem with removing this queue manager from the QSG.
A check of output from CSQ5PQSG FORCE/ADD indicated those remedial actions were successful so were not useful diagnostically.
Resolving the problem
Typically, the remedial action would be to use CSQ5PQSG REMOVE/FORCE QMGR to remove the queue managers (which both have the same id) from the QSG and then use CSQ5PQSG ADD QMGR to add the queue manager (which already successfully joins the QSG) back in.
In this case the typical remedial action did not clear the failing condition. The mismatch of queue-manager ids was an indication that there may still be some failed-persistent connections to the CSQ_ADMIN structure using 'old' queue-manager ids. To determine this a display is issued :
D XCF,STR,STRNAME=structure name
At the bottom of this output of current connections (including those which are failed-persistent) is where the queue-manager id is displayed, lastly, in the 'CONNECTION NAME' column. Typically, multiple queue managers with the same queue-manager id (assuming the connection is FAILED-PERSISTENT) can be cleaned up by failing the CSQ_ADMIN structure. In this case, while there were not multiple queue managers with the same queue-manager id, the connections were able to be cleaned up by failing the CSQ_ADMIN structure with the command :
After the structure is failed then the failing queue manager should be able to be added and started.
Note that failed persistent connections are normally the result of some member of the QSG terminating abnormally while a member of that group. Typically, simply restarting the queue manager will clean up such connections or peer-level recovery should do so. If, however, the queue manager is removed from the QSG before either of these events takes place then clean-up can not occur. (Scenario : CSQ1 is removed from the QSG via CSQ5PQSG FORCE, but none of the other queue managers in the group are active at the time of a failure).
WebSphere MQ WMQ