Parsers

A parser is a program that interprets the physical bit stream of an incoming message, and creates an internal logical representation of the message in a tree structure. The parser also regenerates a bit stream for an outgoing message from the internal message tree representation.

A parser is called when the bit stream that represents an input message is converted to the internal form that can be handled by the integration node; this invocation of the parser is known as parsing. The internal form, a logical tree structure, is described in Logical tree structure. It is described as a tree because messages are typically hierarchical in structure; a good example of this structure is XML. The way in which the parser interprets the bit stream is unique to that parser; therefore, the logical message tree that is created from the bit stream varies from parser to parser.

The parser that is called depends on the structure of a message, referred to as the message template. Message template information comprises the message domain, message model, message type, and physical format of the message. Together, these values identify the structure of the data that the message contains.

A parser is also called when a logical tree that represents an output message is converted into a bit stream; this action by the parser is known as writing. Typically, an output message is generated by an output node at the end of the message flow. However, you can connect more nodes to an output node to continue processing of the message.

The message domain identifies the parser that is used to parse and write instances of the message. The remaining parts of the message template, message model, message type, and physical format, are optional, and are used by model-driven parsers such as the MRM parser.

The logical structure of the message typically maps to the business content of the message; for example, it contains a customer name, address, and account number. It is only when you send a message across a connection that the physical characteristics are important, and influence the construction of the bit stream.

The integration node requires access to a parser for every message domain to which your input messages and output messages belong. In addition, the integration node requires a parser for every identifiable message header that is included in the input or output message. Parsers are called when required by the message flow.

Body parsers

IBM® Integration Bus provides built-in support for messages in the following message domains by providing message body parsers:

MRM (MRM parser and domain)
XMLNSC, XMLNS, and XML (XML parsers and domains)
SOAP (SOAP parser and domain)
DataObject (DataObject parser and domain)
JMSMap and JMSStream (JMS parsers and domains)
MIME (MIME parser and domain)
BLOB (BLOB parser and domain)
IDOC (IDOC parser and domain)
JSON (JSON parser and domain)
DFDL (DFDL parser and domain)

See Which body parser should you use? for a discussion about which message body parser to use under what circumstances.

You specify which message domain to use for your message at the place in the message flow where parsing or writing is initiated.

To parse a message bit stream, typically you set the Message Domain property of the input node that receives the message. But, if you are initiating the parse operation in ESQL, use the DOMAIN clause of the CREATE statement.
The message tree that is created is described in Message tree structure. Its exact form might change as it progresses through the message flow, depending on what the nodes are doing.

The last child element of the Root element of the message tree takes the name of the body parser that created the tree. For example, if the Message Domain property was set to MRM, the last child element of Root is called MRM, which indicates that the message tree is owned by the MRM parser.
To write a message, the integration node calls the owning body parser to create the message bit stream from the message tree.

Some body parsers are model-driven, which means that they use predefined messages from a message set when parsing and writing. The MRM, SOAP, DataObject, IDOC, and (optionally) XMLNSC parsers are model-driven parsers. To use these parsers, messages must be modeled in a message set and deployed to the integration node from the IBM Integration Toolkit.

Other body parsers are programmatic, which means that the messages that they parse and write are self-defining messages, and no message set is required. See Predefined and self-defining messages.

When you use a model-driven parser, you must also specify the message model and, optionally, the message type and message format so that the parser can locate the deployed message definition with which to guide the parsing or writing of the message.

To parse a message bit stream, typically you set the Message model, Message , and Physical format properties of the input node that receives the message. Or, if you are initiating the parse operation in ESQL, you use the SETTYPE, and FORMAT clauses of the CREATE statement. This information is copied into the Properties folder of the message tree.

To write a message, the integration node calls the owning body parser to create the message bit stream from the message tree. If the parser is a model-driven parser, it uses the MessageSet, MessageType, and MessageFormat fields in the Properties folder.

Whether the message type or message format are needed depends on the message domain.

Even if the body parser is not model-driven, it is good practice to create and use a message set in the IBM Integration Toolkit, because it simplifies the development of your message flow applications, even though the message set is not deployed in the IBM Integration Bus runtime environment. See Why model messages? for information about the advantages of creating a message set.

Header parsers

IBM Integration Bus also provides parsers for the following message headers, which your applications can include in input or output messages:

WMQ MQMD (The MQMD parser)
WMQ MQMDE (The MQMDE parser)
WMQ MQCFH (The MQCFH parser)
WMQ MQCIH (The MQCIH parser)
WMQ MQDLH (The MQDLH parser)
WMQ MQIIH (The MQIIH parser)
WMQ MQRFH (The MQRFH parser)
WMQ MQRFH2 and MQRFH2C (The MQRFH2 and MQRFH2C parsers)
WMQ MQRMH (The MQRMH parser)
WMQ MQSAPH (The MQSAPH parser)
WMQ MQWIH (The MQWIH parser)
WMQ SMQ_BMH (The SMQ_BMH parser)
JMS header (Representation of messages in the JMS Transport)
HTTP headers (HTTP headers)

All header parsers are programmatic and do not use a message set when parsing or writing.

User-defined parsers

To parse or write message body data or headers that the supplied parsers do not handle, you can create user-defined parsers that use the IBM Integration Bus user-defined parser programming interface.

Tip: No parser is provided for messages, or parts of messages, in the WMQ format MQFMT_IMS_VAR_STRING. Data in this format is often preceded by an MQIIH header (format MQFMT_IMS). IBM Integration Bus treats such data as a BLOB message. If you change the CodedCharSetId or the encoding of such a message in a message flow, the MQFMT_IMS_VAR_STRING data is not converted, and the message descriptor or preceding header does not correctly describe that part of the message. If you need the data in these messages to be converted, use the MRM domain and create a message set to model the message content, or provide a user-defined parser.

Root parsers

A root parser is the first parser in the Logical Tree structure built by the integration node. The root parser is defined by which input node you are using in your message flow, for example theSOAPInput Node uses a different root parser to the MQInput Node. Each root parser creates different properties parsers to work with the properties folder in the Logical Tree because different input nodes need to obtain and serialize these values in transport specific ways. The root parser that is assigned to different trees can cause different behavior when elements are copied between trees. The WebSphere MQ Root parser is used when you create new trees in ESQL.

Parser Copies

When you copy a tree between one parser and another, for example as a result of using ESQL then the behavior is different depending on the types of the parsers that are involved.

Like Parser Copies

When a tree is copied between two parsers that are the same, for example in ESQL:

SET OutputRoot.XMLNSC.myTesData.Data = InputRoot.XMLNSC.myInputData.myDarta

Then a "Like Parser Copy" is performed. In this instance, the integration node can be certain that the tree can be fully represented by the target parser and so all parser information is copied. The target parser has an identical copy of the tree that is taken from the source parser. All the parser structure under this element is retained and all attributes in the message are still represented as attributes in the target tree.

Unlike Parser Copies

When you copy a tree between two different parsers, for example in ESQL:

SET OutputRoot.DFDL.Data.Account = InputRoot.XMLNSC.Data.Account

In this instance, the integration node cannot know whether the source tree can be fully represented by the target parser. This behavior is because parsers do not all support the same type of structure and content information. In this instance instead of creating a copy of the source tree, the integration node must navigate down the logical structure, creating the elements using the target parser. Therefore, that parser structure is not maintained, and elements that were attributes in the source might not be attributes in the target tree. After an unlike parser copy, the target tree consists only of Name-Value pairs under the target parser with no other parser information.

Note that any parsers that were children of the root element of the source tree that are copied, are not preserved in the output tree. When serializing, you must ensure that trees created from unlike parser copies are constructed in such a way that a body parser can serialize the output message.