IBM Integration Bus, Version 9.0.0.5 Operating Systems: AIX, HP-Itanium, Linux, Solaris, Windows, z/OS

See information about the latest product version

Reducing the size of the message tree

When a message flow processes messages, the input bitstream is split up by a parser and allocated to fields in the message trees. The resultant message tree uses far more memory than the bitstream itself. Therefore, when processing large messages, the memory cost is much greater.

As the message tree is stored in UCS-2 (2-byte Unicode), when the input message data is distributed into its separate fields, any text fields double in size for their memory representation. The message tree fields must then have a name, a field type, and other underlying information that is used internally by IBM® Integration Bus. When the memory usage to maintain the message tree structure are also added, then it is easy to see that a fully parsed message tree is far larger than the bit-stream itself. This behavior is demonstrated in the following example of a simple fixed-length message model:
myParent: minOccurs=1, maxOccurs=unbounded
 - myElement1: STRING: :Length=1
 - myElement2: STRING: :Length=1
 - myElement3: STRING: :Length=1
 - myElement4: STRING: :Length=1
 - myElement5: STRING: :Length=1
In a single-byte codepage, an example input message is as follows:
ABCDE
These 5 bytes produce a body folder message tree that looks like the following example:
DFDL
 - myParent
   - myElement1 = A
   - myElement2 = B
   - myElement3 = C
   - myElement4 = D
   - myElement5 = E

This body message tree with 7 elements uses 62 characters just to represent the names of these 7 elements. As a minimum, the memory usage is 124 bytes in UCS-2. The 5 bytes of bitstream data is stored in the tree, and doubled to 10 bytes because it is stored as UCS-2 data. So already it is approximately 134 bytes of message tree for just the 5 bytes of data.

The value of each field is stored as a syntax element. Syntax elements do not allocate the exact amount of memory for each field because this behavior is inefficient when syntax elements are reused. Syntax element storage string data has a reserve of 28 bytes, meaning that the name and value pairs are taking up 14 multiples of 28 bytes giving 392 bytes. Each tree field generally has a minimum memory usage of at least 100 bytes (and can be more dependent on domain): This result is another 700 bytes for the 7 fields, and as such the 5 bytes that are parsed now take up 1092 bytes. This tree structure memory usage is unavoidable, and because these resources are reused for each message, they are not directly accessible by a message flow.

However, if the following example used the unbounded nature of the repeating parent structure, and has 1000 repetitions of the 5-byte input message:
ABCDEABCDEABCDE .........ABCDE.......

The input bitstream data alone would be 5000 bytes. A fully parsed message tree for this input bitstream contains 6001 fields (That is, 6 for each of the 1000 repeating record and 1 for the DFDL root element). Assuming again that there are 100 bytes for each repeating 6 fields, and then 6 * 2 lots of 28 bytes for simple name and value pairs, then that is 936 bytes for each repeating record. So for a 5000-byte input messages the 1000 repeating records would occupy 936000 bytes, which is getting close to 1 MB. These numbers are just to demonstrate the weight of the tree structure in memory, and their exact range and maximum values change between release. However. it is easy to see that for more complex models and for more repetitions, the memory usage becomes much larger.

So how do I reduce the size of the memory tree?

Some simple ways to reduce the size of the memory tree are:
  1. Build a smaller message tree where possible. Use compact parsers such as XMLNSC, DFDL, and RFH2C, and use opaque parsing.
  2. You can improve performance by reducing the number of times that the message tree is copied, by using the following techniques:
    • Reduce the number of Compute nodes and JavaCompute nodes in a message flow.
    • If possible, set the Compute Mode property on the node to not include the message.
    • Copy at an appropriate level in the message tree; for example, copy once rather than for multiple branch nodes.
    • Copy data to the environment.

bj60050_.htm | 
        
        Last updated:
        
        Last updated: 2016-08-12 11:20:23