IBM Integration Bus, Version 9.0.0.5 Operating Systems: AIX, HP-Itanium, Linux, Solaris, Windows, z/OS

See information about the latest product version

Large output messages

So far the recommendations have covered the input message trees and how these trees can be handled and copied efficiently. However, some message flows might need to generate a large output message, especially if a large input message is being augmented and routed.

In the same way that large input trees need to be handled correctly, large output trees can be optimized to reduce memory usage. The following example is a large output tree that requires a large amount of memory:

Root
 - Properties
   -
 - MQMD
   - ..
 - XMLNSC
   - TestCase
     - Record
       - Field01 = A 
       - Field02 = B
       - Field03 = C
       - Field04 = D
       - Field05 = E
     - Record
       - ...
     - Record
       - ...

In this example tree, Record repeats 100 000 times. The parent field that is combined with its 5 child fields, means that there are 600 000 output fields for the repeating records. The resulting large output message tree could cause memory issues in the DataFlowEngine process.

While most domains can parse large input messages using large message handling techniques, only a few domains are able to store large message trees. The domain needs to be able to serialize using the FolderBitStream instruction (that is, serialize a portion at a time).Then, the domains serialization needs to support elements of type BitStream.

The following example demonstrates how the ESQL can be changed to build a large message tree. The following ESQL creates over 600 000 fields for the repeating records:

CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';
DECLARE currentRecord REFERENCE TO outRef
DECLARE recordTotal INT 100000;
DECLARE recordCount INT 0;
WHILE recordCount < recordTotal DO
  CREATE LASTCHILD OF outRef AS currentRecord NAME 'Record';
  SET currentRecord.Field01 = 'A';
  SET currentRecord.Field02 = 'B';
  SET currentRecord.Field03 = 'C';
  SET currentRecord.Field04 = 'D';
  SET currentRecord.Field05 = 'E';
  SET recordCount = recordCount + 1;
END WHILE;

When augmenting a large message then the large message handling techniques for parsing and writing large messages may be combined. The next record would be parsed using partial parsing. This next record could be updated and then ASBITSTREAM called on it in FolderBitStream mode, and inserted into the tree as a BitStream field. The current record is then deleted, as shown in the following example:

CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
DECLARE currentRecord REFERENCE TO outRef
CREATE LASTCHILD OF currentRecord AS currentRecord NAME 'Record';
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';
DECLARE folderBytes BLOB;
DECLARE recordTotal INT 100000;
DECLARE recordCount INT 0;
WHILE recordCount < recordTotal DO
  SET currentRecord.Field01 = 'A';
  SET currentRecord.Field02 = 'B';
  SET currentRecord.Field03 = 'C';
  SET currentRecord.Field04 = 'D';
  SET currentRecord.Field05 = 'E';
  SET folderBytes = ASBITSTREAM(currentRecord OPTIONS FolderBitStream);
  CREATE LASTCHILD OF outRef TYPE XMLNSC.BitStream NAME 'Record' VALUE folderBytes;
  SET recordCount = recordCount + 1;
  DELETE FIELD currentRecord; -- Free up the temporary record that was used in serialization.
END WHILE;

Although a message flow that handles real business data is likely to be more complex than this example, the technique is the same. Use ASBITSTREAM in FolderBitStream mode on the next record, insert it into the tree as BitStream field and then delete the current record as shown in the following example:

--Parse in the Environment message tree to avoid clashing with the OutputRoot.XMLNSC folder.
CREATE FIELD Environment.Variables.XMLNSC DOMAIN('XMLNSC');
SET Environment.Variables.XMLNSC = InputRoot.XMLNSC; -- Assumes we are copying an unparsed 
-- XMLNSC folder by its bitstream
DECLARE inputRef REFERENCE TO Environment.Variables.XMLNSC.RepeatingRecord;

--Create the output folders
CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';

WHILE LASTMOVE(inputRef) = TRUE DO
  SET inputRef.Field02 = 'Z';
  DECLARE folderBytes BLOB ASBITSTREAM(inputRef OPTIONS FolderBitStream);
  CREATE LASTCHILD OF outRef TYPE XMLNSC.BitStream NAME 'RepeatingRecord' VALUE folderBytes;
  DECLARE previousRecord REFERENCE TO inputRef; 
  MOVE inputRef NEXTSIBLING 'RepeatingRecord'; -Move to the next repeating record
  DELETE FIELD previousRecord;  -- delete the record we have dealt with
END WHILE;
DELETE Environmet.Variables.XMLNSC; -- Delete the XMLNSC folder that was being used.

Large data is often associated with the File transport. The FileOutput node can append data to a file during the flow processing. Therefore, you can write message flows where records are written to a file individually, and as such large output message trees are not required. The FolderBitStream and BitStream elements provide a method of constructing a large message tree where multiple records are represented by single records. If a domain does not support these capabilities, but is model-based, then it is possible to write an alternative model where records are represented as single binary elements. If a global element represents the record content, it is possible to avoid the use of the FolderBitStream and BitStream elements.

Binary large object (BLOB) processing

If your message flow solutions uses raw BLOB processing in ESQL to build an output message, then use these solutions to concatenate function to join BLOB portions together in OutputRoot.BLOB.BLOB for example. This technique can cause excessive memory use due to fragmentation and a large final result. Consider a message flow that reads in a 1 MB BLOB and assigns it to the BLOB domain. For the purposes of the following demonstration, ESQL uses a WHILE loop that causes the repeated concatenation of the 1 MB BLOB to produce a 57 MB output message:

DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET d = c;
DECLARE i INT 1;
WHILE (i <= 56) DO
   SET c = c || d;
   SET i = i + 1;
END WHILE;
SET OutputRoot.BLOB.BLOB = CAST(c AS BLOB CCSID InputProperties.CodedCharSetId);

In this example, the following sequence of events occurs:

A 1 MB input message is assigned to a variable c and is then also copied to d.
The loop then concatenates c to d, and assigns the result back to c on iteration.
Variable c grows by 1 MB on every iteration.

As this processing generates a 57 MB BLOB, you might expect the message flow to use around 130 MB of storage (The ~60 MB of variables in the Compute node, and then 57 MB in the Output BLOB parser, which is serialized on the MQOutput node.)

However, this is not the case. This ESQL causes a significant growth in the integration server's storage usage due to the nature of the processing. This ESQL encourages what is known as fragmentation in the memory heap. This condition means that the memory heap has enough free space on the current heap, but has no contiguous blocks that are large enough to satisfy the current request.

When you deal with BLOB or CHAR Scalar variables in ESQL, these values must be held in contiguous buffers in memory. These buffers are continuous blocks of storage, which are large enough to hold the values. Therefore, when the ESQL statement: SET c = c || d; is executed, in memory terms it is not just a case of appending the value of d to the current memory location of c. The concatenation operator takes two operands and then assigns the result to another variable, and in this case the variable is one of the input parameters.

So logically the concatenation operator could be written: SET c = concatenate(c,d);

This example is not valid syntax, but is being used to illustrate that this operator is like any other binary operand function.

The value that was originally contained in c cannot be deleted until the operation is complete because c is used on input. Furthermore, the result of the operation needs to be contained in temporary storage before it can be assigned to c. The scenario has ever increasing values that makes it more likely that the current heap does not have enough contiguous free blocks to contain the larger value.

This limitation is because the blocks that are being freed are smaller than the larger values that are being generated.

Why does the memory usage grow?

This section continues from the previous fragmentation example, which is repeated here for reference:

DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET d = c;
DECLARE i INT 1;
WHILE (i <= 56) DO
   SET c = c || d;
   SET i = i + 1;
END WHILE;
SET OutputRoot.BLOB.BLOB = CAST(c AS BLOB CCSID InputProperties.CodedCharSetId);

So, now consider the possible pattern of allocations that could take place in this scenario. The best possible case is where no other threads are running that might make allocations in the freed blocks. This scenario also assumes that during the execution of ESQL for this thread, no small allocations are made in the free blocks. In a real integration server, these allocations would take place even with one flow running, because the broker has administration threads that run periodically. As this scenario considers only the memory increases, the starting size of the integration server is ignored and only the allocations that are made around the while loop are discussed.

For this example, the following assumptions are made:

The variables c and d are 1 MB blocks, and so in the example each character is 1 MB.
The X character represents a used block.
The - character represents a free block.

First, c and d occupy a total of 2 MB storage:
```
c d
X X
```
Then, c and d are concatenated together, which requires another 2 MB of storage. This storage must be allocated on the heap to store the result of the concatenation, which gives:
```
c d c d
X X X X
```
After the result is assigned to c₁, the original c 1 MB block is freed:
```
- d c d
- X X X
    |_|
     c₁ 
```
The heap grows to 4 MB, with 1 MB free. Now, d is concatenated to c again, and therefore needs 3 MB because c₁ is 2 MB. There is not 3 MB free on the heap, and so the heap must expand by 3 MB to give:
```
- d c d c d d
- X X X X X X
    |_| |___|
     c₁   c₂ 
```
Now the original c₁ is freed, which gives a heap of 7 MB, with 3 MB of free blocks:
```
- d - - c d d
- X - - X X X
        |___|
          c₂
```
A further concatenation of 3 MB and 1 MB now requires 4 MB for the result, and there is not a contiguous 4 MB block on the heap. Therefore, the heap needs to expand to satisfy this request, giving:
```
- d - - c d d c d c d
- X - - X X X X X X X
        |___| |_____|
          c₂     c₃
```

And the original c₂ is freed to give a heap of 11 MB, with 6 MB of free blocks:

- d - - - - - c d c d
- X - - - - - X X X X
              |_____|
                 c₃

So even in the unrealistic best case possible scenario, this heap keeps expanding on the basis that the variable cannot be freed during the processing of the current iteration. Therefore, the heap must contain both inputs and targets at the same time. If this pattern is projected to 56 iterations to produce a 57 MB output message, then this behavior causes the integration server using 500 - 600 MB of memory which is much larger than the original estimate.

However, this example was the best case scenario. The worst case scenario is that there is no heap reuse, and so every iteration causes an ever increasing growth. When this growth is projected out, the integration server requires over 1.5 GB of storage. Therefore, it is possible that this scenario causes a situation where the operating system refuses to allocate storage to this process which results in an abend.

As demonstrated, this type of costly BLOB processing must be avoided. The BLOB parser supports multiple BLOB children on output. Therefore, when you manually construct an output message in the BLOB domain, the different portions of the BLOB message must be assigned to multiple BLOB children. The following example demonstrates this behavior with changes the previous fragmentation ESQL example. In this example, the memory can build up as the message is processed through the loop with concatenation (c || d) leading to fragmentation. The last piece of code avoids fragmentation in this example with no concatenation and CREATE LASTCHILD OF OutputRoot.BLOB NAME 'BLOB' VALUE c :

DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
DECLARE i INT 1;
WHILE (i <= 56) DO
   CREATE LASTCHILD OF OutputRoot.BLOB NAME 'BLOB' VALUE c;
   SET i = i + 1;
END WHILE;

bj60046_.htm |

Last updated: 2016-08-12 11:20:23