Your WebSphere MQ cluster is having a problem and you need to know how to troubleshoot it. This document describes several cluster issues and how to address them.
Resolving the problem
Cluster Hints and Tips
- Avoid using the REFRESH CLUSTER command until you have exhausted other options.
- If you are running WebSphere MQ V184.108.40.206 or later, make sure you have a SYSTEM.CLUSTER.HISTORY.QUEUE in place before using REFRESH CLUSTER. If IBM support are involved in the problem, the history saved to this queue can help determine the root cause of your clustering issue.
- Make sure your cluster objects have either the CLUSTER or CLUSTERNL attribute set, depending on whether they appear in one or more clusters.
- When setting up or extending a cluster, you should define a CLUSSDR channel from each partial repository to a full repository, never the other way around.
- To move a queue manager to a new address, use SUSPEND QMGR to suspend it from the cluster, update the CONNAME field on its CLUSRCVR channel, then use RESUME QMGR to make it available in the cluster.
- If you move one full repository to a new address, make sure the other full repository is available to handle cluster activity during the move. Update the CONNAME on any manually defined CLUSSDR channels in the cluster after the full repository is available at its new address.
- Be aware that clustering allows you to send messages to queues elsewhere in the cluster, however, you can get messages only from queues on the local queue manager.
- Review the Ten Quick Tips for a Healthy MQ Cluster blog post on developerWorks.
Checking the Cluster Status
- Make sure the cluster repository process (amqrrmfa) for your queue manager is running. If this process has ended abnormally, restart the queue manager in order to get it running again.
- Make sure your cluster queue managers and cluster channels are in a good working status:
Checking cluster queue managers and channels
DISPLAY CLUSQMGR(*) ALL
DISPLAY CHSTATUS(*) WHERE(CHLTYPE EQ CLUSSDR) ALL
DISPLAY CHSTATUS(*) WHERE(CHLTYPE EQ CLUSRCVR) ALL
- If your cluster channels are not working, or your cluster queue managers show a "SYSTEM.TEMPUUID" value, which indicates a communications problem, review the WebSphere MQ channel troubleshooting page for advice on clearing up channel problems.
- Make sure you can see the cluster queues you are using:
Checking cluster queues
DISPLAY QCLUSTER('Your.Queue.Name') ALL
DISPLAY Q('Your.Queue.Name) CLUSINFO
- Be aware that partial repository queue managers will not display cluster queues which they have not accessed recently. If you run a program locally which accesses (MQOPENs) the cluster queue, you should then see it displayed.
Workload Balancing and Round-Robin Processing
- Your cluster queue should have the parameter DEFBIND set to NOTFIXED, otherwise any program opening the queue will send all messages to it rather than spreading them around.
- Any MQI application sending messages should use the MQOPEN option MQOO_BIND_NOT_FIXED for precisely the same reason.
- Any MQI application opening a cluster queue should leave the queue manager name empty in the object descriptor. If the application sets the MQOD.ObjectQMgrName field, then instances of the cluster queue on other queue managers will be ineligible to receive messages.
- If your queue manager has a local instance of a cluster queue, local applications will default to sending all of their messages to it. You can change this behavior by modifying the queue manager:
Changing the cluster message delivery behavior
DISPLAY QMGR CLWLUSEQ
ALTER QMGR CLWLUSEQ(ANY)
- Make sure your cluster channels are running properly in order to achieve an even distribution of messages. Use CLWLRANK rather than CLWLPRTY if you want the cluster workload algorithm to ignore cluster channel status when distributing messages to cluster queues.
WebSphere MQ WMQ