You can process XML documents from your RPG program by using the XML-INTO or XML-SAX statements. These statements are the RPG language interface to the high-speed XML parser. The parser currently being used by RPG is a non-validating parser, although it checks XML documents for many well-formedness errors. See the "XML Conformance" section in the "XML Reference Material" appendix of the ILE COBOL Programmer's Guide for more information on the XML parser.
The XML documents can be in a character or UCS-2 RPG variable, or they can be in an Integrated File System file.
The parser is a SAX parser. A SAX parser operates by reading the XML document character by character. Whenever it has located a fragment of the XML document, such as an element name, or an attribute value, it calls back to a handling procedure provided by the caller of the parser, passing it information about the fragment of XML that it has found. For example, when the parser has found an XML element name, it calls the handling procedure indicating that the "event" is a "start element" event and passing it the name of the element.
The handling procedure processes the information and returns to the parser which continues to read the XML document until it has enough information to call the handling procedure with another event. This process repeats until the entire XML document has been parsed, or until the handling procedure indicates that parsing should end.
For example, consider the following XML document:
<email type="text">
<sendto>JohnDoe@there</sendto>
</email>
The following are the fragments of text that the parser would read, the events that it would generate, and the data associated with each event. Note: The term "whitespace" refers to end-of-line characters, tab characters and blanks.
Parsed text | Event | Event data |
---|---|---|
start document | ||
start element | "email" | |
type= | attribute name | "type" |
"text" | attribute value | "text" |
>whitespace | element content | the whitespace |
<sendto> | start element | "sendto" |
JohnDoe@there | element content | "JohnDoe@there" |
</sendto> | end element | "sendto" |
whitespace | element content | the whitespace |
</email> | end element | "email" |
end document |
The XML-SAX and XML-INTO operation codes allow you to use the XML parser.
For example, if you know that an XML document will contain an XML attribute with the name type, and you want to know the value of this attribute, your handling procedure can wait for the "attribute name" event to have a value of "type". Then the next time the handler is called, it should be an "attribute value" event, with the required data ("text" in the example above).
For example, if you know that the XML document will always have the form of the document above, you can define an RPG data structure with the name "email", and with subfields "type" and "sendto". Then you can use the XML-INTO operation to read the XML document directly into the data structure. When the operation is complete, the "type" subfield would have the value "text" and the "sendto" subfield would have the value "JohnDoe@there".
The XML data is always returned by the parser in text form. If the data is known to represent other data types such as numeric data, or date data, the XML-SAX handling procedure must use conversion functions such as %INT or %DATE to convert the data.
The XML-INTO operation will automatically convert the character data to the type of the field or subfield specified as the receiver.
Both the XML-SAX and XML-INTO operations allow you to specify a series of options that control the operation. The options are specified in a single character expression in the form
'opt1=val1 opt2=val2'
Each operation has its own set of valid options. The options that are common to both operation codes are