Planning user-defined parsers

Read about the concepts that you should consider before you develop a user-defined parser.

When you have considered the information provided here, and are ready to develop your own parser, use the instructions in Developing user-defined parsers to construct your parser.

Analysis

Before you start to create your own parser, be clear about its purpose. You can perform most tasks using the functions that are provided with IBM® Integration Bus, so you might not need to create a user-defined parser for your particular task.

Before you construct and implement a user-defined parser, consider the following questions:

Do you need to create a user-defined parser?
If the available parsers in IBM Integration Bus are not appropriate for your needs, define your own parser to parse internal, customer-specific, or generic commercial message formats.
Does IBM Integration Bus already provide a parser for the domain or message header?
See Parsers for details of message domains for which the supplied parsers can accept input messages, and message headers with which the supplied parsers can work.
Does the syntax of the in-house or commercial message dictate a format that can be parsed?
To parse the message successfully, does the parser need to interact with vendor software? If so, does the API that enables access to this software break your threading model?
Do you need to process multi-part, multi-format messages?
IBM Integration Bus does not support multi-part, multi-format messages. A multi-part MRM message must consist of messages that are all in the same format.
What type of parsing strategy will provide best performance?
IBM Integration Bus supports partial parsing, which allows your parser to parse only relevant fields in a message. Using partial parsing can save system resources.

Partial and full parsing

IBM Integration Bus supports partial parsing. If an individual message contains hundreds or even thousands of individual fields, the parsing operation requires considerable memory and processor resources to complete. An individual message flow might reference only a few of these fields, or none at all, so it is inefficient to parse every input message completely. For this reason, IBM Integration Bus allows parsing of messages on an as-needed basis. (This ability does not prevent a parser from processing the entire message in one step, and some parsers are written to process the entire message in this way.)

Each syntax element in a logical message has two bits that indicate whether all the elements on either side of an element are complete, and whether its children are complete. Parsing is typically completed in a bottom-to-top, left-to-right manner. When a parser has parsed the siblings of a particular element that precede the given element and the first child, it sets the first completion bit to one. Similarly, when the pointer to the next sibling of an element is complete, as well as its last child pointer, the other completion bit is set to one.

In partial parsing, the integration node waits until a part of the message is referenced, and invokes the parser to parse that part of the message. Message processing nodes refer to fields within a message using hierarchical names. The name begins at the root of the message and proceeds down the message tree until the particular element is located. If an element is encountered without its completion bits set, and further navigation from this element is required, the appropriate parser entry point is called to parse the necessary part of the message. The relevant part of the message is parsed, appropriate elements are added to the logical message tree, and the element in question is marked as complete.

If you do not need to parse the full bit stream, you can use partial parsing. During partial parsing, a parser is called recursively until the requested element is returned, or until the message tree has been marked as complete, and the requested element is known not to exist.

Whether you choose to perform a full or partial parse depends on how the message will be processed. If most field elements within the message are likely to be accessed during processing, performing a full parse of the message when an attempt is made to access it is typically more efficient, particularly for smaller messages.

However, if most field elements within the message are not likely to be accessed during processing, performing a partial parse of the message when an attempt is made to access a specific field is typically more efficient, particularly when the message size grows.