Processing XML Documents

You can process XML documents from your RPG program by using the XML-INTO or XML-SAX statements. These statements are the RPG language interface to the high-speed XML parser. The parser currently being used by RPG is a non-validating parser, although it checks XML documents for many well-formedness errors. See the "XML Conformance" section in the "XML Reference Material" appendix of the ILE COBOL Programmer's Guide for more information on the XML parser.

The XML documents can be in a character or UCS-2 RPG variable, or they can be in an Integrated File System file.

The parser is a SAX parser. A SAX parser operates by reading the XML document character by character. Whenever it has located a fragment of the XML document, such as an element name, or an attribute value, it calls back to a handling procedure provided by the caller of the parser, passing it information about the fragment of XML that it has found. For example, when the parser has found an XML element name, it calls the handling procedure indicating that the "event" is a "start element" event and passing it the name of the element.

The handling procedure processes the information and returns to the parser which continues to read the XML document until it has enough information to call the handling procedure with another event. This process repeats until the entire XML document has been parsed, or until the handling procedure indicates that parsing should end.

For example, consider the following XML document:

<email type="text">
  <sendto>JohnDoe@there</sendto>
</email>

The following are the fragments of text that the parser would read, the events that it would generate, and the data associated with each event. Note: The term "whitespace" refers to end-of-line characters, tab characters and blanks.

Parsed text Event Event data
start document
<email start element "email"
type= attribute name "type"
"text" attribute value "text"
>whitespace element content the whitespace
<sendto> start element "sendto"
JohnDoe@there element content "JohnDoe@there"
</sendto> end element "sendto"
whitespace element content the whitespace
</email> end element "email"
end document

The XML-SAX and XML-INTO operation codes allow you to use the XML parser.

  1. The XML-SAX operation allows you to specify an event handling procedure to handle every event that the parser generates. This is useful if you do not know in advance what an XML document may contain.

    For example, if you know that an XML document will contain an XML attribute with the name type, and you want to know the value of this attribute, your handling procedure can wait for the "attribute name" event to have a value of "type". Then the next time the handler is called, it should be an "attribute value" event, with the required data ("text" in the example above).

  2. The XML-INTO operation allows you to read the contents of an XML document directly into an RPG variable. This is useful if you know the format of the XML document and you know that the names of the XML elements in the document will be the same as the names you have given to your RPG variables.

    For example, if you know that the XML document will always have the form of the document above, you can define an RPG data structure with the name "email", and with subfields "type" and "sendto". Then you can use the XML-INTO operation to read the XML document directly into the data structure. When the operation is complete, the "type" subfield would have the value "text" and the "sendto" subfield would have the value "JohnDoe@there".

  3. The XML-INTO operation also allows you to obtain the values of an unknown number of repeated XML elements. You provide a handling procedure that receives the values of a fixed number of elements each time the handling procedure is called. This is useful if you know that the XML document will contain a series of identical XML elements, but you don't know in advance how many there will be.

The XML data is always returned by the parser in text form. If the data is known to represent other data types such as numeric data, or date data, the XML-SAX handling procedure must use conversion functions such as %INT or %DATE to convert the data.

The XML-INTO operation will automatically convert the character data to the type of the field or subfield specified as the receiver.

Both the XML-SAX and XML-INTO operations allow you to specify a series of options that control the operation. The options are specified in a single character expression in the form

'opt1=val1 opt2=val2'

Each operation has its own set of valid options. The options that are common to both operation codes are

doc
The "doc" option specifies whether the XML document that you provide to the operation is the name of an Integrated File System file containing the document, or the document itself. The default is "doc=string" indicating that you have provided an actual XML document. You use the option "doc=file" to indicate that you have provided the name of a file containing the actual XML document.
ccsid
The "ccsid" option specifies the CCSID in which the XML parser will return data. For the XML-SAX operation, you can specify any CCSID that the parser supports. For the XML-INTO operation, you can only control whether the parsing will be done in single-byte character or UCS-2. See the information in the ILE RPG Reference for more information on the "ccsid" option for each of these operation.


[ Top of Page | Previous Page | Next Page | Contents | Index ]