Hierarchical data transformation

Use the Hierarchical Data stage to create powerful hierarchical transformations, parse and compose JSON/XML data, and invoke REST web services with high performance and scalability.

Many industries have standardized on Extensible Markup Language (XML) as the mechanism to use to exchange information between organizations. With the broader acceptance and use of these standards, companies have increasingly looked to XML to also satisfy requirements for the exchange of information between different IT units within their organization. The business projects for which XML has been adopted have generated specific requirements for IT. In some cases, the data volume that is represented in an XML document is minimal. For example, the data might represent a single transaction, but it might have many layers of hierarchical complexity. Other projects require that multi-gigabyte files with relatively simple XML schemas be transformed into a new format that prescribes to an industry standard. When the data volume and complex hierarchy requirements meet, they often present their own challenges to traditional IT tools.

The Hierarchical Data stage includes capabilities that easily manage the design and processing requirements presented by the most challenging XML sources. The IT developer can leverage the schema library manager to register the XML metadata in its native form. This metadata forms the basis which guide the design activities. The Hierarchical Data stage uses an integrated user interface component called the assembly editor to facilitate the transformation of XML data from hierarchical to relational or other hierarchical formats, or from relational to hierarchical formats. After the logic is constructed, the job runtime leverages unique components that provide various forms of parallelism that are built specifically for hierarchical data formats, such XML. With these mechanisms IBM® InfoSphere® DataStage® scales to meet very high volumes and manage system resources efficiently.

Choosing an XML solution

InfoSphere DataStage provides two XML solutions: the XML pack and the Hierarchical Data stage. The XML pack, which includes the XML Input, XML Output, and XML Transformer stages, is useful if you have already made an investment in using this technology or if you want to perform only very simple transformations that do not involve a large amount of data. The Hierarchical Data stage the best choice if you have not yet created an XML solution and want perform complex transformations on large amounts of data.