IBM Integration Bus, Version 9.0.0.5 Operating Systems: AIX, HP-Itanium, Linux, Solaris, Windows, z/OS

See information about the latest product version

Parsing strategies

You can use several efficient parsing strategies during the message flow development to reduce memory usage when you parse and serialize messages. This section describes Partial parsing, Opaque parsing, and how to avoid unnecessary parsing.

These strategies are as follows: These strategies are described in detail in the following sections.

Identifying the message type quickly

It is important to be able to correctly recognize the correct message format and type as quickly as possible. In message flows that process multiple types of message, this identification can be a problem. What often happens is that the message needs to be parsed multiple times to ensure that you have the correct format. That extra parsing to determine the message type needs to be avoided to reduce memory usage.

The message flow in Figure 1 shows a number of Filter nodes (Filter1 & Filter2), several subflows (subflow1, subflow2, and so on) each containing more nodes. The message flow is complex and is implemented with a long critical path. As such, messages are parsed multiple times.

Figure 1. A complex message flow diagram with multiple subflows
Figure 2. A less complex message flow diagram with Label nodes, and no subflows.

In this example, the use of functions and procedures, ESQL parsing techniques, and dynamic routing of the flow are combined into the minimum number of nodes (excludes error handling subflow nodes). The logic for each of the paths is coded as a function or procedure, and called from the main procedure in the Compute node. This method also avoids the multiple parsing of messages that is executed in the multiple Filter nodes and subflows. This method significantly reduces the performance cost due to fewer nodes, and ultimately leads to less parsing, tree copying, and so on, for the most optimized solution.

Tip: You must remember; when you section a large, complex flow into multiple smaller message flows allows the individual message flows to release memory after each large step of processing. That is, after each smaller flow finishes its processing. So, you need to ensure that a balanced approach is adopted between these strategies to get the optimum memory and performance usage.

Partial parsing

A message is parsed only when necessary to resolve the reference to a particular part of its content. An input message can be of any length, and parsing the entire message for only a specific part of content is not usually required. Partial parsing (also referred to as On-demand parsing) improves the performance of message flows, and reduces the amount of parsed data that is stored in memory. Partial parsing is used to parse an input message bit stream only as far as is necessary to satisfy the current reference.

To use Partial parsing, you must set the Parse timing property on the input node to On Demand

All the parsers that are provided with IBM® Integration Bus support partial parsing. The amount of parsing that must be performed depends on which fields in a message need to be accessed, and the position of those fields in the message. In the next two diagrams, one has the fields ordered A to Z (Figure 3) and the other with them ordered Z to A (Figure 4). Depending on which field is needed, one of the cases is more efficient than the other. If you need to access field Z, then the first case would be best. Where you have influence over message design ensure that information that is needed for routing for example is placed at the start of the message and not at the end of the message.

Figure 3. Diagram of a message that is ordered A to Z
Figure 4. Diagram of a message that is ordered Z to A

When you use ESQL and Mapping nodes, the field references are typically explicit. That is, you have references such as InputRoot.Body.A. IBM Integration Bus parses only as far as the required message field to satisfy that reference. The parser stops at the first instance. When you use the XPath query language, the situation is different. By default, an XPath expression searches for all instances of an element in the message, which implicitly means that a full parse of the message takes place. If you know that there is only one element in a message, then there is the chance to optimize the XPath query, for example, to retrieve only the first instance. For example, /aaa[1] if you want just the first instance of the search argument.

Opaque parsing

For XMLNSC messages, you can use Opaque parsing: A technique that allows the whole of an XML sub tree to be placed in the message tree as a single element.

Opaque parsing is supported for the XMLNS and XMLNSC domains only.

Use the XMLNSC domain in new message flows if you want to use opaque parsing. The XMLNS domain is deprecated, and offers a more limited opaque parsing facility than the XMLNSC domain. The XMLNS domain is provided only to support legacy message flows.

Figure 5. Diagram of some XML with the name section highlighted.
The entry in the message tree is the bitstream of the original input message. This technique has two benefits:
  1. It reduces the size of the message tree because the XML subtree is not expanded into the individual elements.
  2. The cost of parsing is reduced because less of the input message is expanded as individual elements and added to the message tree.

You can use opaque parsing where you do not need to access the elements of the subtree. For example, you need to copy a portion of the input tree to the output message but might not care about the contents in this particular message flow. You accept the content in the subfolder and have no need to validate or process it in any way.

Specifying elements for opaque parsing

You must specify elements for opaque parsing in the Parser Options section of the Input node of the message flow, as shown in Figure 6:

Figure 6. A screen capture of the Parser Options tab of the MQInput Node properties dialog.

To specify elements for opaque parsing, add the element names to the Opaque elements table. Ensure that message validation is not enabled, otherwise it automatically disables opaque parsing. Opaque parsing does not make sense for validation, because the whole message must be parsed and validated.

Tip: Opaque parsing for the named elements occurs automatically when the message is parsed. It is not possible to use the CREATE statement to opaquely parse a message in the XMLNSC domain; only node options can be used to add opaque parsing.

Opaque parsing in action

With typical on-demand parsing, if you need to access fields in the header and trailer sections of the message then the whole message must be parsed. In this example, you have a flow that needs to access the <version> and <type> fields only. The following message structure shows those structures, and is abbreviated for clarity:
<tns:Inventory xmlns:tns="http://www.example.org/NewXMLSchema"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.example.org/NewXMLSchema Inventory.xsd ">
 <tns:header>  <tns:version>V100</tns:version>
 </tns:header>
 <tns:body>
  <tns:field1>tns:field1</tns:field1>
  <tns:field2>tns:field2</tns:field2>
…..
  <tns:field1000>tns:field2</tns:field1000>
</tns:body>
<tns:trailer>
  <tns:type>tns:type</tns:type>
 </tns:trailer>
</tns:Inventory>
Using opaque parsing, you can eliminate the need to parse the body section of the payload. You need to set the parent of the elements that you do not want to parse in the Parser Options section of the Input node of the message flow, as shown in Figure 7.
Figure 7. A screen capture of the Parser Options tab of the MQInput Node properties dialog. The body element is specified as opaque.

All elements that are defined in the Opaque elements list are treated as a single string when parsed. This parsing behavior is shown in Figure 8.

Figure 8. A screen capture of the parsed tree.

When you design a message structure, if you have the opportunity to group elements based on the parsing needs, then this method greatly improves performance. In the previous example, if you move the <type> field into the header, there would be no need for opaque parsing: The on-demand parser would not need to go past the header in this example.

Avoiding unnecessary parsing

One effective technique to reduce the cost of parsing, is not to parse.

The strategy is to avoid having to parse some parts of the message as shown in Figure 9.

Figure 9. Diagram of a message that is ordered A to Z. Field C was found.

For example:

You have a message routing message flow that needs to look at a field to make a routing decision. If that field is in the body of the message, then the body of the incoming message must be parsed to get access to it. The processing cost varies depending on which field is needed:
  • If it is field A, then it is right at the beginning of the body and would be found quickly.
  • If it is field Z, then the cost might be different, especially if the message is several megabytes in size.
Here is a technique to reduce this cost:

Use the application that created this message to copy the field that is needed for routing into a header within the message. For an WebSphere® MQ message, this field might an MQRFH2 header, and a JMS property for a JMS message for example. If you use this technique, it is no longer necessary to parse the message body, potentially saving a large amount of processing effort. The MQRFH2 or JMS Properties folder still needs to be parsed, but with a smaller amount of data. The parsers in this case are also more efficient than the general parser for a message body because the structure of the header is known. Copy key data structures to MQMD, MQRFH2, or JMS Properties to prevent parsing the user data.


bj60039_.htm | 
        
        Last updated:
        
        Last updated: 2016-08-12 11:20:23