Differences in an XML document after storage and retrieval
When you store an XML document in a DB2® database and then retrieve that copy from the database, the retrieved document might not be exactly the same as the original document. This behavior is defined by the XML and SQL/XML standard.
Some of the changes to the document occur when the document is stored. Those changes are:
- If you execute XMLVALIDATE, the database server:
- Strips ignorable whitespace from the input document
- If you do not request XML validation, the database server:
- Strips boundary whitespace, if you do not request preservation
- Replaces all carriage return and line feed pairs (U+000D and U+000A), or carriage returns (U+000D), within the document with line feeds (U+000A)
- Performs attribute-value normalization, as specified in the XML
1.0 specification
This process causes line feed (U+000A) characters in attributes to be replaced with space characters (U+0020).
Additional changes occur when you retrieve the data from an XML column. Those changes are:
- If the data has an XML declaration before it is sent to the database
server, the XML declaration is not preserved.
With implicit serialization for DB2 CLI and embedded SQL applications, the DB2 database server adds an XML declaration with the appropriate encoding specified to the data. For .NET applications, the DB2 database server also adds an XML declaration. For Java™ applications, depending on the SQLXML object methods that are called to retrieve the data from the SQLXML object, the data with an XML declaration added by the DB2 database server will be returned.
If you execute the XMLSERIALIZE function, the DB2 database server adds an XML declaration with an encoding specification if you specify the INCLUDING XMLDECLARATION option.
- Within the content of a document or in attribute values, certain
characters are replaced with their predefined XML entities. Those
characters and their predefined entities are:
Character Unicode value Entity representation AMPERSAND U+0026 & LESS-THAN SIGN U+003C < GREATER-THAN SIGN U+003E > - Within attribute values, the QUOTATION MARK (U+0022) character is replaced with its predefined XML entity ".
- If the input document has a DTD declaration, the declaration is not preserved, and no markup based on the DTD is generated.
- If the input document contains CDATA sections, those sections are not preserved in the output.