IBM Integration Bus, Version 9.0.0.5 Operating Systems: AIX, HP-Itanium, Linux, Solaris, Windows, z/OS

See information about the latest product version

Large messages: Tricky trailer records

Message data is typically large when it contains many repeating records, and these repeating records might be enclosed by Header and Trailer records. The header record gives origin or batch information on the batch of records that is being received and the trailer record gives summary information about the batch as whole, such as counts on items or total prices.

In a routing application, a message flow might need to examine the header and trailer records before routing. In augmentation scenarios, a message flow might need to update only the trailer record. To get to the trailer record in each case, all repeating record instances must be parsed. For these types of scenario, different approaches must be considered when you use the XMLNSC or DFDL domains. The aim of the different approaches is to reduce the number of fields that are handled in the message tree.

XMLNSC supports opaque elements. This capability is covered in Opaque parsing in Parsing strategies. That section describes the opaque parsing functionality, which enables XPaths to be entered for elements that are not intended to be "inflated" in the message tree. That is, instead of parsing all the fields for the named record, a single opaque field is added to the message tree. This opaque field contains the portion of the bitstream that would be parsed for this record. If the XPath identifies all the repeating records between a header and trailer record, then no individual fields would be created for each record.

For model-based domains such as DFDL and XMLNSC, the model determines the structure of the message tree that is parsed. When you parse a large message with a repeating record that does not need to be inflated, memory can be saved by constructing an alternative model to represent the sparse message. This alternative model has just a single field for a record instead of all of its detailed fields. The single field is defined to consume the same bitstream content as the detailed version of the record.

The following example is a simple DFDL fixed-length model:

myHeader
 - BatchNumber: String: Length=16
myRecord: minOccurs=1, maxOccurs=unbounded
 - Field1 : String: Length=1
 - Field2 : String: Length=1
 - Field3 : String: Length=1
 - Field4 : String: Length=1
 - Field5 : String: Length=1
myTrailer
 - TotalRecord: Int: Length=8

In this example, the myRecord parent field could repeat millions of times. As such, even with large message handling techniques, the 5 fields of every repeating record must be parsed and then deleted. The following model can be used instead, to reduce memory usage:

myHeader
 - BatchNumber: String: Length=16
myRecord: hexBinary: Length=5, minOccurs=1, maxOccurs=unbounded
myTrailer
 - TotalRecord: Int: Length=8

In this example, each single myRecord is parsed and deleted, but the parser does not have to parse and inflate the complexity of each record. This technique does not only save memory, but also improves performance of a large parse, especially when each repeating record has a complex or large structure itself. However, if that data is needed, then there is no saving in performance.

bj60045_.htm |

Last updated: 2016-08-12 11:20:23