DataStage XML Input stage fails with "Xalan fatal error" due to Invalid character (Unicode: 0x0) and Invalid document structure.

Technote (troubleshooting)


Problem(Abstract)

The DataStage XML Input stage fails when processing XML data with the following errors:

XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 5, column: 323): Invalid character (Unicode: 0x0)

XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 1): Invalid document structure

Diagnosing the problem

When then XML Input stage outputs "Xalan fatal error" during document parsing, it generally means that the XML document has invalid contents, or invalid formatting.

The specific pair of errors:

  • Invalid character (Unicode: 0x0)
  • Invalid document structure
are often an indication that the XML document was truncated mid-record.

XML file truncation is usually not a problem when the XML Input stage has been configured to read the XML file directly. However, when the XML file is instead read by the Sequential File stage, and then the contents of file are passed to the XML input stage as a single record, then the above errors occurs due to the Sequential File stage breaking the XML input file into multiple records.

Although the Sequential File stage does have an implicit record mode which can combine multiple records into a single record, this still has limitations on size since implicit record processing handles data in 512-byte chunks which will then be sent to next stage as separate records. Thus sits using a Sequential File stage with implicit records may cause the XML Input stage to give the above 2 errors any time that the XML file exceeds 512 bytes since the record then sent to XML Input stage will not be a complete XML document.

The recommended solution for this issue is to configure the XML Input stage to read the XML file directly. This is done by passing the name of the file instead of the contents of the file to the XML Input stage.

Use the following steps to switch a job running on Unix platform to use the External Source stage to build a list of input XML files instead of reading the full file with a sequential file stage:
  1. Delete the Sequential File input stage and associated output link
  2. Add an External Source stage and link it to XML Input stage.
  3. Edit the External Source stage, output tab, and set the following 2 fields:
    Source Method = specific programs
    Source Program = ls (full path to file)
    for example, "ls /opt/xmldata/testdocument.xml
    The goal is that the command outputs the fully qualified filename and nothing else (such file sizes, permissions, column headings, etc). There are many ways to send a filename to the XML Import stage in DataSage, this is just one example.
  4. Go to the Columns tab and add a column named "Filename" of type VarChar (no length needed)
  5. Edit the XML Input stage, Input tab. Select the URL/File Path radio button, then select "Filename" in the XML Source Column selection list.
  6. Compile and run job.

If you are using a Server job instead of Parallel job, then use the Folder stage instead of XML Source stage.

Another alternative to using an External Source stage or Folder stage to build a list of fully qualified file names is to just add the file names to a sequential file and use that to provide the XML stage with list of one or more files. For example:
  1. Create a sequential file on server machine which contains one line containing location of file, for example:
    /opt/xmldata/testdocument.xml
  2. Add a Sequential file stage to the job and define a single VarChar column with column name of Filename.
  3. Add a link from the Sequential File stage to the XML Input stage.
  4. Edit the XML Input stage, Input tab. Select the URL/File Path radio button, then select "Filename" in the XML Source Column selection list.
  5. Compile and run job.


Rate this page:

(0 users)Average rating

Document information


More support for:

InfoSphere DataStage

Software version:

7.5, 8.1, 8.5

Operating system(s):

AIX, HP-UX, Linux, Solaris, Windows

Reference #:

1502464

Modified date:

2011-09-08

Translate my page

Machine Translation

Content navigation