See information about the latest product version
Large output messages
So far the recommendations have covered the input message trees and how these trees can be handled and copied efficiently. However, some message flows might need to generate a large output message, especially if a large input message is being augmented and routed.
Root
- Properties
-
- MQMD
- ..
- XMLNSC
- TestCase
- Record
- Field01 = A
- Field02 = B
- Field03 = C
- Field04 = D
- Field05 = E
- Record
- ...
- Record
- ...
In this example tree, Record repeats 100 000 times. The parent field that is combined with its 5 child fields, means that there are 600 000 output fields for the repeating records. The resulting large output message tree could cause memory issues in the DataFlowEngine process.
While most domains can parse large input messages using large message handling techniques, only a few domains are able to store large message trees. The domain needs to be able to serialize using the FolderBitStream instruction (that is, serialize a portion at a time).Then, the domains serialization needs to support elements of type BitStream.
CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';
DECLARE currentRecord REFERENCE TO outRef
DECLARE recordTotal INT 100000;
DECLARE recordCount INT 0;
WHILE recordCount < recordTotal DO
CREATE LASTCHILD OF outRef AS currentRecord NAME 'Record';
SET currentRecord.Field01 = 'A';
SET currentRecord.Field02 = 'B';
SET currentRecord.Field03 = 'C';
SET currentRecord.Field04 = 'D';
SET currentRecord.Field05 = 'E';
SET recordCount = recordCount + 1;
END WHILE;
When
augmenting a large message then the large message handling techniques for parsing and writing large
messages may be combined. The next record would be parsed using partial parsing. This
next record could be updated and then ASBITSTREAM called on it in FolderBitStream
mode, and inserted into the tree as a BitStream field. The current record is
then deleted, as shown in the following example:CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
DECLARE currentRecord REFERENCE TO outRef
CREATE LASTCHILD OF currentRecord AS currentRecord NAME 'Record';
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';
DECLARE folderBytes BLOB;
DECLARE recordTotal INT 100000;
DECLARE recordCount INT 0;
WHILE recordCount < recordTotal DO
SET currentRecord.Field01 = 'A';
SET currentRecord.Field02 = 'B';
SET currentRecord.Field03 = 'C';
SET currentRecord.Field04 = 'D';
SET currentRecord.Field05 = 'E';
SET folderBytes = ASBITSTREAM(currentRecord OPTIONS FolderBitStream);
CREATE LASTCHILD OF outRef TYPE XMLNSC.BitStream NAME 'Record' VALUE folderBytes;
SET recordCount = recordCount + 1;
DELETE FIELD currentRecord; -- Free up the temporary record that was used in serialization.
END WHILE;
Although
a message flow that handles real business data is likely to be more complex than this example, the
technique is the same. Use ASBITSTREAM in FolderBitStream mode on the next record,
insert it into the tree as BitStream field and then delete the current record
as shown in the following example:--Parse in the Environment message tree to avoid clashing with the OutputRoot.XMLNSC folder.
CREATE FIELD Environment.Variables.XMLNSC DOMAIN('XMLNSC');
SET Environment.Variables.XMLNSC = InputRoot.XMLNSC; -- Assumes we are copying an unparsed
-- XMLNSC folder by its bitstream
DECLARE inputRef REFERENCE TO Environment.Variables.XMLNSC.RepeatingRecord;
--Create the output folders
CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') NAME 'XMLNSC';
DECLARE outRef REFERENCE TO OuptutRoot.XMLNSC;
CREATE LASTCHILD OF outRef AS outRef NAME 'TestCase';
WHILE LASTMOVE(inputRef) = TRUE DO
SET inputRef.Field02 = 'Z';
DECLARE folderBytes BLOB ASBITSTREAM(inputRef OPTIONS FolderBitStream);
CREATE LASTCHILD OF outRef TYPE XMLNSC.BitStream NAME 'RepeatingRecord' VALUE folderBytes;
DECLARE previousRecord REFERENCE TO inputRef;
MOVE inputRef NEXTSIBLING 'RepeatingRecord'; -Move to the next repeating record
DELETE FIELD previousRecord; -- delete the record we have dealt with
END WHILE;
DELETE Environmet.Variables.XMLNSC; -- Delete the XMLNSC folder that was being used.
Large data is often associated with the File transport. The FileOutput node can append data to a file during the flow processing. Therefore, you can write message flows where records are written to a file individually, and as such large output message trees are not required. The FolderBitStream and BitStream elements provide a method of constructing a large message tree where multiple records are represented by single records. If a domain does not support these capabilities, but is model-based, then it is possible to write an alternative model where records are represented as single binary elements. If a global element represents the record content, it is possible to avoid the use of the FolderBitStream and BitStream elements.
Binary large object (BLOB) processing
DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET d = c;
DECLARE i INT 1;
WHILE (i <= 56) DO
SET c = c || d;
SET i = i + 1;
END WHILE;
SET OutputRoot.BLOB.BLOB = CAST(c AS BLOB CCSID InputProperties.CodedCharSetId);
- A 1 MB input message is assigned to a variable c and is then also copied to d.
- The loop then concatenates c to d, and assigns the result back to c on iteration.
- Variable c grows by 1 MB on every iteration.
However, this is not the case. This ESQL causes a significant growth in the integration server's storage usage due to the nature of the processing. This ESQL encourages what is known as fragmentation in the memory heap. This condition means that the memory heap has enough free space on the current heap, but has no contiguous blocks that are large enough to satisfy the current request.
When you deal with BLOB or CHAR Scalar variables in ESQL, these values must be held in contiguous buffers in memory. These buffers are continuous blocks of storage, which are large enough to hold the values. Therefore, when the ESQL statement: SET c = c || d; is executed, in memory terms it is not just a case of appending the value of d to the current memory location of c. The concatenation operator takes two operands and then assigns the result to another variable, and in this case the variable is one of the input parameters.
So logically the concatenation operator could be written: SET c = concatenate(c,d);
This example is not valid syntax, but is being used to illustrate that this operator is like any other binary operand function.
The value that was originally contained in c cannot be deleted until the operation is complete because c is used on input. Furthermore, the result of the operation needs to be contained in temporary storage before it can be assigned to c. The scenario has ever increasing values that makes it more likely that the current heap does not have enough contiguous free blocks to contain the larger value.
This limitation is because the blocks that are being freed are smaller than the larger values that are being generated.
Why does the memory usage grow?
DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET d = c;
DECLARE i INT 1;
WHILE (i <= 56) DO
SET c = c || d;
SET i = i + 1;
END WHILE;
SET OutputRoot.BLOB.BLOB = CAST(c AS BLOB CCSID InputProperties.CodedCharSetId);
So, now consider the possible pattern of allocations that could take place in this scenario. The best possible case is where no other threads are running that might make allocations in the freed blocks. This scenario also assumes that during the execution of ESQL for this thread, no small allocations are made in the free blocks. In a real integration server, these allocations would take place even with one flow running, because the broker has administration threads that run periodically. As this scenario considers only the memory increases, the starting size of the integration server is ignored and only the allocations that are made around the while loop are discussed.
- The variables c and d are 1 MB blocks, and so in the example each character is 1 MB.
- The X character represents a used block.
- The - character represents a free block.
- First, c and d occupy a total
of 2 MB storage:
c d X X
- Then, c and d are concatenated
together, which requires another 2 MB of storage. This storage must
be allocated on the heap to store the result of the concatenation,
which gives:
c d c d X X X X
- After the result is assigned to c1,
the original c 1 MB block is freed:
- d c d - X X X |_| c1
- The heap grows to 4 MB, with 1 MB free. Now, d is
concatenated to c again, and therefore needs 3 MB
because c1 is 2 MB. There is not 3 MB
free on the heap, and so the heap must expand by 3 MB to give:
- d c d c d d - X X X X X X |_| |___| c1 c2
- Now the original c1 is freed, which
gives a heap of 7 MB, with 3 MB of free blocks:
- d - - c d d - X - - X X X |___| c2
- A further concatenation of 3 MB and 1 MB now requires 4 MB for
the result, and there is not a contiguous 4 MB block on the heap.
Therefore, the heap needs to expand to satisfy this request, giving:
- d - - c d d c d c d - X - - X X X X X X X |___| |_____| c2 c3
- And the original c2 is freed to give
a heap of 11 MB, with 6 MB of free blocks:
- d - - - - - c d c d - X - - - - - X X X X |_____| c3
So even in the unrealistic best case possible scenario, this heap keeps expanding on the basis that the variable cannot be freed during the processing of the current iteration. Therefore, the heap must contain both inputs and targets at the same time. If this pattern is projected to 56 iterations to produce a 57 MB output message, then this behavior causes the integration server using 500 - 600 MB of memory which is much larger than the original estimate.
However, this example was the best case scenario. The worst case scenario is that there is no heap reuse, and so every iteration causes an ever increasing growth. When this growth is projected out, the integration server requires over 1.5 GB of storage. Therefore, it is possible that this scenario causes a situation where the operating system refuses to allocate storage to this process which results in an abend.
DECLARE c, d CHAR;
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
DECLARE i INT 1;
WHILE (i <= 56) DO
CREATE LASTCHILD OF OutputRoot.BLOB NAME 'BLOB' VALUE c;
SET i = i + 1;
END WHILE;