Where's my message? Tool and instructions to use the MQ recovery log to find out what happened to your persistent MQ messages on distributed platforms

Technote (troubleshooting)


Problem(Abstract)

Operations performed on persistent messages passing through an IBM WebSphere MQ queue manager are logged. This document provides a tool and set of instructions to read that log data for a distributed MQ queue manager, extract the persistent messages, and see the transactions and operations performed (puts/gets/commits/rollbacks).

Symptom

Example situations where you might use this document:

  • An application team using persistent messages is asking "Where's my message?", or asserting "MQ lost my message", and you want to look back in time and see what happened to the message in MQ to help explain the situation to the application team and show MQ was not at fault..
  • An application has processed a set of messages incorrectly, and needs to recover the original message payload in order to investigate the problem, or recover the business data.
  • An application performed, or is performing, a long running unit of work, and you want to look back and see what messages and operations are involved in that unit of work.
  • You require an audit log of transactions with persistent messages that pass through your MQ infrastructure, and wish to exploit the recovery logging built into the product for this purpose, rather than implementing an additional layer of auditing on top.

Warning: For persistent messages sent outside of syncpoint, there is a possibility that messages will not be written to the recovery log.

Environment

The tool and instructions contained in this document are provided in good faith and AS-IS. There is no warranty or further service implied or committed and the supplied tool is not supported via IBM product service channels.

Validation of the tool supplied in this technote has been performed with dmpmqlog output from MQ V7.0 and V7.5 on Windows, Linux x86-64 and AIX, using simple test data. It is important that you test it thoroughly in your environment before considering relying on it as part of any critical operational procedures.

Nonpersistent messages are not written to the MQ recovery log.

Persistent messages sent outside of syncpoint might not be written to the MQ recovery log in all circumstances.

If you wish to provide feedback on the features or operation of the tool, please do so via comments on this MQDev blog entry.


Diagnosing the problem

Three stages are required to analyze the MQ recovery logs:


Stage 1: Use the dmpmqlog tool supplied with MQ to extract the data you need from the MQ recovery log in a documented format

The following instructions assume you are using Linear logging. If you are using Circular Logging, then you can dump the entire contents of the recovery log, by stopping your queue manager and using the -b option in dmpmqlog. However, it is impossible to recover persistent message data or transaction information from before the start of the active portion of the recovery log. As such converting to using Linear logging should be considered if you wish to be able to use the steps in this document in the future for problem diagnosis or audit purposes.

To learn more about the differences between Circular and Linear logging, see the Types of logging section of the MQ documentation. Also see SupportPacs MO73 and MS0L for automated solutions to log file maintenance with Linear logging.

Part A (optional) - create a new 'dummy' queue manager

If the target queue manager under investigation is stopped and sufficient space is available for the dmpmqlog output, then you can skip to part B and run dmpmqlog directly against the target queue manager.

  1. Create a new 'dummy' queue manager on a machine with the same hardware platform and operating system.
    UNIX example:
    > crtmqm DMPMQLOG_DUMMY
    The same machine as the queue manager under investigation can be used (as is the case in the example commands). However, performing all of these steps on a machine running an active production queue manager introduces a risk of operational mistakes affecting production. Consider either stopping the queue manager as per step 1, or creating a dummy queue manager on a remote machine and perform each of the copy steps below using SFTP or similar.
  2. Copy the queue manager object catalog from the target queue manager to the dummy queue manager, as described in the dmpmqlog section of the MQ documentation.
    UNIX example:
    > cp /var/mqm/qmgrs/QM_TARGET/qmanager/QMQMOBJCAT /var/mqm/qmgrs/DMPMQLOG_DUMMY/qmanager/QMQMOBJCAT
  3. Delete all of the active log extents from the dummy queue manager.
    UNIX example:
    > rm /var/mqm/log/DMPMQLOG_DUMMY/active/*.LOG
    Double check you supply the name of the dummy queue manager, not the real one.
  4. Advance the log on the target queue manager, using RESET QMGR in MQSC.
    This step ensures that all of the data you are interested in is contained in log extents that are not being actively updated by the queue manager.
    MQSC Example:
    RESET QMGR TYPE(ADVANCELOG)
       10 : RESET QMGR TYPE(ADVANCELOG)
    AMQ8649: Reset WebSphere MQ Queue Manager accepted.
  5. Identify the current log extent that the MQ queue manager is writing to using DISPLAY QMSTATUS in MQSC.
    MQSC Example:
    DIS QMSTATUS CURRLOG
        8 : DIS QMSTATUS CURRLOG
    AMQ8705: Display Queue Manager Status Details.
      QMNAME(MYQMGR)                          STATUS(RUNNING)
      CURRLOG(
    S0000016.LOG)
    In this example S0000016.LOG is the current log extent of the running queue manager. So log extents S0000015.LOG and earlier are eligible to be copied safely to the dummy queue manager in the next step.
  6. Copy all log extents you wish to extract message data from to the dummy queue manager.
    Use the timestamps of the extent files to help you choose the log extents you wish to extract message and transaction data from.
    UNIX example:
    > cd /var/mqm/log/QM_TARGET/active
    > cp S0000012.LOG S0000013.LOG S0000014.LOG S0000015.LOG /var/mqm/log/DMPMQLOG_DUMMY/active
    Bear in mind that dmpmqlog data will be approximately 5 times as large as the original data, and will be written to a single file. So if you have a large quantity of data to process, you might choose to perform step 8 onwards multiple times, for smaller chunks of the overall data.
  7. Copy the log header file from the target queue manager to the dummy queue manager.
    UNIX example:
    > cp /var/mqm/log/QM_TARGET/amqhlctl.lfh /var/mqm/log/DMPMQLOG_DUMMY/amqhlctl.lfh

Part B (required) - use the dmpmqlog command to extract the logs

Once you have chosen whether to run the dmpmqlog command directly against the target queue manager, or a dummy queue manager with a copy of the logs, you can run the dmpmqlog command.

  1. Run the dmpmqlog command, specifying the first extent number that you wish to process.
    The dmpmqlog command will continue to run until it runs out of extent files.
    UNIX example:
    > dmpmqlog -m QMNAME-n 12 > /path/to/dmpmqlog.output.txt

Stage 2: Use the Java tool provided in this document to post-process the dmpmqlog output

The following Jar package contains the tool. It is a fully functional sample of how you could process the dmpmqlog output using the information documented in the Dumping the contents of the log using the dmpmqlog command section of the online MQ documentation.

dmpmqlog.scraper-20141201.jardmpmqlog.scraper-20141201.jar

Revision history:
Jan 2014 - Version 20140114 - Initial version
Dec 2014 - Version 20141201 - Added options to disable payload output and set read charset
Some of the MQ data structures inside the hex formatted message data in dmpmqlog output are undocumented, such as the compact MQMD structure used in Put records. The sample tool provides very simple processing for these structures, based on the offsets of key information in the structure that can be determined by putting known data into a message. For example, the offset of the MQMD.MsgId or the offset of the payload of the data.

These fixed offsets might change in any future release. So if you are trying the tool with a release newer than V7.5 and see unexpected results, then you might need to calculate the new offsets by sending your own test messages using a tool such as IH03. The offsets are specified in the dmpmqlog.scraper.properties file inside the Jar package, which you can edit and update inside the Jar archive without recompiling the tool.

You can use any Java 5 or later virtual machine to run the tool.

If you do not already have a Java 5 or later virtual machine installed, you can use the Java JRE supplied with IBM WebSphere MQ to run the tool.
Check you have installed the "Java JRE" component for your operating system as described here:
https://ibm.biz/BdRWY8
The location of the 'java' executable varies by platform:
Windows:
%MQ_INSTALL_ROOT%\java\jre\bin\java
AIX:
/usr/mqm/java/jre/bin/java
Linux 32bit:
/opt/mqm/java/jre/bin/java
Linux 64bit:
/opt/mqm/java/jre64/bin/java
Solaris:
/opt/mqm/java/jre/bin/java
HP-UX:
/opt/mqm/java/jre/bin/java
Here are example commands to run the tool against the dmpmqlog output obtained in Stage 1:
  • For output from machines with big-endian processors, such as POWER (AIX/Linux), System z (Linux), SPARC (Solaris) and PARISC (HP-UX):
    java -jar dmpmqlog.scraper-20131223.jar -b big-endian -i /path/to/in.dmpmqlog.txt -o /path/to/out/dir
  • For output from machines with little-endian processors, such as x86-64 (Linux, Windows, Solaris) and IA64 (HP-UX):
    java -jar dmpmqlog.scraper-20131223.jar -b little-endian -i /path/to/in.dmpmqlog.txt -o /path/to/out/dir

The tool creates a set of files in the specified output directory.

A summary.txt file containing the user operations performed within the log. The file is semicolon delimited text with some simple width formatting, making it suitable for use in grep commands, text editors and spreadsheet programs.
Each operation has the following columns:

  • TXN
    A unique identifier for the transaction (or non-transacted single operation). This is the numeric LSN of the first operation in the transaction.
  • OPERATION
    "PUT", "GET", "PREPARE", "COMMIT", "ROLLBACK"
    The special operation name "INFLIGHT" is used at the end of the output to list the transactions that had work within the dmpmqlog output supplied, but did not commit.
  • LSN
    The LSN of the log record, so you can find it in the original dmpmqlog output
  • MSGID
    For PUT operations and GET operations (that correspond to a PUT within the log) this contains the MQMD.MsgId in HEX of the message
  • MSG_PUTTIME
    For PUT records, this contains the PutDate/PutTime of the message from the MQMD of the message.
    Note: this might not be the time the message was written to the queue.
  • CORRELID
    For PUT operations and GET operations (that correspond to a PUT within the log) this contains the MQMD.Correld in HEX of the message
  • GROUPID
    For PUT operations and GET operations (that correspond to a PUT within the log) this contains the MQMD.GroupId in HEX of the message
  • XTRANID
    This is the transaction identifier from the log record. Possible values include:
    • TranType: NULL
      The operation happened outside of syncpoint, so is committed immediately
    • TranType: MQI    TranNum{High 0, Low 45893}
      The operation happened within syncpoint, in a local MQ transaction. The identifier will be re-used within the log, so the TranNum is not unique. Use the TXN column instead to group the operations that happened inside the transaction.
    • XID:574153440000002400000036000001430BCE120C00000001000000F134721FAD52A950DDA913D08C5C13719A34E164F2000001430BCE120C00000001000000F134721FAD52A950DDA913D08C5C13719A34E164F2000000010000000000000000000000000001
      The operation happened within an XA (or JTA) coordinated transaction. The identifier is the big-endian serialisation of an XID. If you need to see the format, global transaction and branch qualifier identifiers separately, then search for the LSN in the original dmpmqlog output file.
  • PUTS
    For PUT operations, this is the MapIndex/QueueName of the message.
    For PREPARE/COMMIT/ROLLBACK/INFLIGHT records, this is the count of PUTs in the transaction, within the supplied data.
  • GETS
    For GET operations, this is the MapIndex/QueueName of the message.
    For PREPARE/COMMIT/ROLLBACK/INFLIGHT records, this is the count of GETs in the transaction, within the supplied data.

Individual files are also created for each message PUT within the supplied data.
Example filenames are as follows:

t00000000000054911432_414d512049494230315f514d202020201e4cab5220001e02.msgdata
t00000000000054911432_414d512049494230315f514d202020201e4cab5220001e02.msgprops
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TXN from summary.txt      The MQMD.MsgId from the message in HEX    

The *.msgdata files contain the payload of the message, including any headers that would be passed to an MQI application in the message body (such as an MQRFH2 header).

Where a corresponding *.msgprops file exists, this contains message properties that are handled as MQI message properties, in a binary MQ format that is not documented and not parsed by the tool.

Stage 3: Extract the business data you need from the output of the tool

The first step is usually to find the messages that you are interested in. For example you might grep for all files that contain a particular customer number, or business transaction ID, that you know is in the message. Once you have found the file containing the message data you are interested in, the filename tells you the MSGID and TXN (described in Stage 2 above).

So assuming you've found the MSGID of a request message you are interested in, here is an example using grep to find details of the request/reply round trip.

First grep for all instances of that message identifier in the summary.txt as follows:

Note 'cut' has been used to choose limit the fields we want to include in the output, and the output has been re-formatted here to aid readability.

$ grep 414d51204d59514d47522020202020203c23b352200ed40b summary.txt | cut -f1,2,3,4,5,6 -d";"
t00000000000302118378;PUT      ;<0:0:4610:58542>;
 414d51204d59514d47522020202020203c23b352200ed40b;2013-12-20/16:24:40:820;
303030303030303800000000000000000000000000000000

t00000000000302313253;GET      ;<0:0:4612:61221>;
 414d51204d59514d47522020202020203c23b352200ed40b;                       ;
 303030303030303800000000000000000000000000000000
t00000000000302313523;PUT      ;<0:0:4612:61491>;
 414d51204d59514d47522020202020203c23b352200ed419;2013-12-20/16:24:40:900;
414d51204d59514d47522020202020203c23b352200ed40

t00000000000302428444;GET      ;<0:0:4614:45844>;
 414d51204d59514d47522020202020203c23b352200ed419;                       ;
 414d51204d59514d47522020202020203c23b352200ed40b
t00000000000302448148;GET      ;<0:0:4615:30>   ;
 414d51204d59514d47522020202020203c23b352200ed419;                       ;
 414d51204d59514d47522020202020203c23b352200ed40b

In this case, we see a PUT of the request message with the MsgId under one transaction, then a GET+PUT of a reply message within the same transaction. These look normal.

However, then we see two GET records for the same reply MsgId. What we cannot see from this grep is whether the transactions here committed or rolled back - we can only see the GET/PUT operations. So the next step is to choose one of those transactions and do a grep:

$ grep t00000000000302428444 summary.txt | cut -f1,2,3,4,5,6 -d";"
t00000000000302428444;GET      ;<0:0:4614:45844>;
 414d51204d59514d47522020202020203c23b352200ed419;                      
 414d51204d59514d47522020202020203c23b352200ed40b
t00000000000302428444;ROLLBACK ;<0:0:4614:49642>;                                             
So here we can see that the first transaction that attempted to get the message rolled back.
Whereas looking at the second one we see it commits:

$ grep t00000000000302448148 summary.txt | cut -f1,2,3,4,5,6 -d";"
t00000000000302448148;GET      ;<0:0:4615:30>;
 414d51204d59514d47522020202020203c23b352200ed419;
 414d51204d59514d47522020202020203c23b352200ed40b
t00000000000302448148;COMMIT   ;<0:0:4615:3558>;

You might want to determine the time when the GET or ROLLBACK occurred. GET/COMMIT/ROLLBACK records do not have a timestamp in them, because MQ does not record one to the log. So in order to establish the time window within which a log record occurred, you need to look at the PUT timestamps around that area.
e.g. open the file in a text editor, find the GET operation by its LSN, and then look above and below that line for PUT operations that contain timestamps.

This means the timing resolution of GET/COMMIT/ROLLBACK operations is dependent on how fast messages are being put to the queue manager. If messages are being put many times a second, then you will be able to time operations accurately. If messages are being put only once every few minutes, then you will only know the few-minute-window within which an operation happened.

Additional considerations when interpreting the output

MQ performs compaction of queues during shutdown, and when recording a media image of a queue. Compaction operations are logged as Put operations, with a null transaction. Be aware of this when reviewing summary output containing a queue manager restart or rcdmqimg, as these Put operations might look like duplication of messages.


Rate this page:

(0 users)Average rating

Document information


More support for:

WebSphere MQ
Reliability

Software version:

7.0, 7.1, 7.5, 8.0

Operating system(s):

AIX, HP-UX, Linux, Solaris, Windows

Software edition:

All Editions

Reference #:

1660642

Modified date:

2014-12-02

Translate my page

Machine Translation

Content navigation