XML parsers and domains

You can use XML domains to parse and write messages that conform to the W3C XML standard.

The term XML domains refers to a group of three domains that are used by IBM® Integration Bus to parse XML documents.

When reading an XML message, the parser that is associated with the domain builds a message tree from the input bit stream. The input bit stream must be a well-formed XML document that conforms to the W3C XML Specification (version 1.0 or 1.1).

When writing a message, the parser creates an XML bit stream from a message tree.

The domains have different characteristics, for guidance about which domain to choose, see Which XML parser should you use?.

XMLNSC domain

The XMLNSC domain is the preferred domain for parsing all general purpose XML messages, including those messages that use XML namespaces. This parser is the preferred parser for the following reasons:

The XMLNSC parser has an architecture that results in ultra-high performance when parsing all kinds of XML.
The XMLNSC parser reduces the amount of memory that is used by the logical message tree that is created from the parsed message. The default behavior of the parser is to discard non-significant white space and mixed content, comments, processing instructions, and embedded DTDs; however controls are provided to retain mixed content, comments, and processing instructions, if required.
The XMLNSC parser can operate as a model-driven parser, and can validate XML messages against XML Schemas generated from a message set, to ensure that your XML messages are correct.

XMLNS domain

If the XMLNSC domain does not meet your requirements, use the alternative namespace-aware domain and parser.

XML domain

The XML domain is not namespace-aware. It is deprecated and must not be used to develop new message flows.

The MRM domain also provides XML parsing and writing facilities. For guidance on when you might use MRM XML instead of one of the XML parsers, see Which XML parser should you use?.

By default, the three XML parsers are programmatic parsers and do not use a message set at run time when parsing and writing. However, the XMLNSC parser can operate as a model-driven parser and can validate XML messages for correctness against XML Schemas generated from a message set.

When you use the XMLNS or XML parsers, or the XMLNSC parser without a message set, it is good practice to create and use a message set in the IBM Integration Toolkit; this action simplifies the development of your message flow applications, even though the message set is not deployed to the integration node run time.

For the advantages of creating a message set, see Why model messages?.

The XML parsers are on-demand parsers. For more information, see Parsing on demand.

The topics in this product documentation provide a summary of XML terminology, concepts, and message constructs. These aspects are important when you use XML messages in your message flows.

Tip: For more detailed information about XML, see the World Wide Web Consortium (W3C) Web site.

Example XML message parsing

A simple XML message might take the following form:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Envelope
PUBLIC "http://www.ibm.com/dtds" "example.dtd"
[<!ENTITY Example_ID "ST_TimeoutNodes Timeout Request Input Test Message">]
>
<Envelope version="1.0">
	<Header>
		<Example>&Example_ID;</Example>
		<!-- This is a comment  -->
	</Header>
	<Body  version="1.0">
		<Element01>Value01</Element01>
		<Element02/>
		<Element03>
			<Repeated>ValueA</Repeated>
			<Repeated>ValueB</Repeated>
		</Element03>
		<Element04><P>This is <B>bold</B> text</P></Element04>
	</Body>
</Envelope>

The following sections show the output that is created by the Trace node when this example message has been parsed in the XMLNS and XMLNSC parsers. They demonstrate the differences in the internal structures that are used to represent the data as it is processed by the integration node.

Example XML Message parsed in the XMLNS domain

In the following example, the white space elements within the tree are present because of the space, tab, and line breaks that format the original XML document; for clarity, the actual characters in the trace have been replaced with 'WhiteSpace'. White space within an XML element does have business meaning, and is represented by using the Content syntax element. The XmlDecl, DTD, and comments, are represented in the XML domain using explicit syntax elements with specific field types.

(0x01000010):XMLNS        = (
    (0x05000018):XML      = (
      (0x06000011): = '1.0'
      (0x06000012): = 'UTF-8'
      (0x06000014): = 'no'
    )
    (0x06000002):         = 'WhiteSpace'
    (0x05000020):Envelope = (
      (0x06000004): = 'http://www.ibm.com/dtds'
      (0x06000008): = 'example.dtd'
      (0x05000021): = (
        (0x05000011):Example_ID = (
          (0x06000041): = 'ST_TimeoutNodes Timeout Request Input Test Message'
        )
      )
    )
    (0x06000002):         = 'WhiteSpace'
    (0x01000000):Envelope = (
      (0x03000000):version = '1.0'
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Header  = (
        (0x02000000):        = 'WhiteSpace'
        (0x01000000):Example = (
          (0x06000020): = 'Example_ID'
          (0x02000000): = 'ST_TimeoutNodes Timeout Request Input Test Message'
          (0x06000021): = 'Example_ID'
        )
        (0x02000000):        = 'WhiteSpace'
        (0x06000018):        = ' This is a comment  '
        (0x02000000):        = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Body    = (
        (0x03000000):version   = '1.0'
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element01 = (
          (0x02000000): = 'Value01'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element02 = 
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element03 = (
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueA'
          )
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueB'
          )
          (0x02000000):         = 'WhiteSpace'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x02000000):  = 'This is '
            (0x01000000):B = (
              (0x02000000): = 'bold'
            )
            (0x02000000):  = ' text'
          )
        )
        (0x02000000):          = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
    )

Example XML Message parsed in the XMLNSC domain

The following trace shows the elements that are created to represent the same XML structure within the compact XMLNSC parser in its default mode. In this mode, the compact parser does not retain comments, processing instructions, or mixed text.

The example illustrates the significant saving in the number of syntax elements that are used to represent the same business content of the example XML message when using the compact parser.

By not retaining mixed text, all of the white space elements that have no business data content are no longer taking any space in the integration node message tree at run time. However, the mixed text in Element04.P is also discarded, and only the value of the child folder, Element04.P.B, is held in the tree; the text This is and text in P is discarded. This type of XML structure is not typically associated with business data formats; therefore, use of the compact XMLNSC parser is typically desirable. However, if you want to this type of processing, either do not use the XMLNSC parser, or use it with Retain mixed text mode enabled.

The handling of the XML declaration is also different in the XMLNSC parser. The version, encoding, and stand-alone attributes are held as child entities of the XmlDeclaration, rather than as elements with a particular field type.

(0x01000000):XMLNSC     = (
    (0x01000400):XmlDeclaration = (
      (0x03000100):Version    = '1.0'
      (0x03000100):Encoding   = 'UTF-8'
      (0x03000100):StandAlone = 'no'
    )
    (0x01000000):Envelope       = (
      (0x03000100):version = '1.0'
      (0x01000000):Header  = (
        (0x03000000):Example = 'ST_TimeoutNodes Timeout Request Input Test Message'
      )
      (0x01000000):Body    = (
        (0x03000100):version   = '1.0'
        (0x03000000):Element01 = 'Value01'
        (0x01000000):Element02 = 
        (0x01000000):Element03 = (
          (0x03000000):Repeated = 'ValueA'
          (0x03000000):Repeated = 'ValueB'
        )
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x03000000):B = 'bold'
          )
        )
   )

Some predefined message models are supplied with the IBM Integration Toolkit and can be imported by using the New Message Definition File wizard and selecting the IBM supplied message option. See Message Sets: IBM supplied messages that you can import.