IBM InfoSphere Streams Version 4.1.0

Operator XMLParse

Primitive operator image not displayed. Problem loading file: ../../image/tk$spl/op$spl.XML$XMLParse.svg

The XMLParse operator accepts a single input stream and generates tuples as a result.

Checkpointed data

When the XMLParse operator is checkpointed, any partially parsed input data are saved in checkpoint. Logic state variables (if present) are also included in checkpoint.

Behavior in a consistent region

The XMLParse operator can be an operator within the reachability graph of a consistent region. It cannot be the start of a consistent region. When in a consistent region, the operator checkpoints and resets any partially parsed input data. Logic state variables (if present) are also automatically checkpointed and resetted.

Checkpointing behavior in an autonomous region

When the XMLParse operator is in an autonomous region and configured with config checkpoint : periodic(T) clause, no checkpoint is taken at runtime. Upon restart, the operator restores to its initial state.

When the XMLParse operator is in an autonomous region and configured with config checkpoint : operatorDriven clause, no checkpoint is taken at runtime. Upon restart, the operator restores to its initial state.

Such checkpointing behavior is subject to change in the future.

Exceptions

The XMLParse operator throws an exception and terminates in the following cases:
  • If the XML is invalid.
  • If the parsing parameter is strict and there is an invalid conversion of XML data to SPL attributes.

Summary

Ports
This operator has 1 input port and 2 or more output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 10 parameters.

Required: trigger

Optional: xmlInput, parsing, flatten, attributesName, textName, nullify, ignorecase, ignoreNamespaces, ignorePrefix

Metrics
This operator reports 1 metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The XMLParse operator has one input port, which contains XML to be converted to tuples.

The XMLParse operator accepts as input a single stream that contains an attribute with XML data to convert. The one attribute that contains XML data must have type rstring, ustring, blob, or xml. If the attribute type is xml, then it represents a complete XML document. If the attribute type is rstring, ustring, or blob, the attribute might contain chunks of XML that are not well-formed and might be contained across multiple input tuples. The XMLParse operator acts as if the chunks are concatenated together. The concatenated XML can contain multiple, sequential, XML documents.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
XMLPathFunctions
<any T> T AsIs(T)

Passthrough function

public rstring XPath(rstring xpathExpn)

Extracts a scalar value from a nodeset that contains a single node.

public list<rstring> XPathList(rstring xpathExpn)

Extracts a list of scalars from XML.

<tuple T> public T XPath (rstring xpathExpn, T tupleLiteral)

Extracts a nested tuple value from a nodeset that contains a single node.

<any T> public list<T> XPathList(rstring xpathExpn, T elements)

Extracts a list of objects from XML.

public map<rstring,rstring> XPathMap(rstring xpathExpn)

Extracts a map of XML attributes.

Ports (0)

The XMLParse operator is configurable with one or more output ports, which have tuples generated from XML input.

Each output port generates tuples that correspond to one subtree of the input XML. The specific subtree of the XML document that triggers a tuple is specified by the trigger parameter by using a subset of XPath. Each output stream corresponds to one expression on the trigger. Tuples are generated as the XML documents are parsed, and a WindowMarker punctuation is generated at the end of each XML document. If errors occur when the XML is parsed, the errors are logged but the tuples are not generated until the start of the next trigger. Receipt of a WindowMarker punctuation resets the XMLParse operator, causing it to start parsing from the beginning of a new XML document. Tuples are output from a stream when the end tag of the element that is identified by the trigger parameter for that stream is seen.

Properties

Ports (1...)

Tuples generated from XML input

Properties

Parameters

This operator supports 10 parameters.
xmlInput

Specifies which attribute of the input stream carries the XML data that the operator parses. If there is only one attribute in the input stream, this parameter is optional.

Properties

trigger

Specifies the subtree of the XML document that triggers a tuple to be output. This parameter is a list of rstring values, one for each output stream, in output stream declaration order. Each rstring contains an absolute XPath expression that identifies the top-level element of a subtree with the XML document. The XPath expression is a UTF-8 string value.

Properties

parsing

Specifies the parsing behavior of the XMLParse operator. The valid values are strict and permissive. The default value is strict.

When the parameter value is strict, an exception is raised for invalid conversions of XML data to SPL attributes and the operator terminates. When the parameter value is permissive, an error is logged and execution continues.

Properties

flatten

Specifies the interpretation of scalar (or list<scalar>) attributes seen in the tuple definition for implicit XPath generation. The valid values are attributes, elements, and none. The default is none.

Properties

attributesName

Specifies the SPL attribute name to be used in the handling of implicit XPath. The default value is _attrs.

Properties

textName

Specifies the SPL attribute content name that is used in the handling of implicit XPath. The default value is _text.

Properties

nullify

Set missing attributes default to null values

Properties

ignorecase

Specifies whether to ignore the case of elements and attributes.

Properties

ignoreNamespaces

Specifies whether to ignore namespaces in names. If the parameter value is true, names in the XML ignore the leading namespace: and are compared only with the local name. By default, the parameter value is false and the whole name, including the colon (:), is used. A name such as foo:bar can be matched only by using XPath ("foo:bar") or similar functions.

Properties

ignorePrefix
Specifies a string that, if present, is removed from the start of an attribute name that is used to form an implicit XPath directive. You can use this method for XML that contains elements or attributes with SPL or C++ keywords. For example:
stream <rstring __graph> A = XMLParse(Input) {
  param trigger      : "/a";
        flatten      : element;
        ignorePrefix : "__";
}

This example accepts XML of the following form:

<a>
  <graph>value</graph>
</a>

Since graph is an SPL keyword, stream<rstring graph> A = XMLParse is not valid SPL.

Properties

Code Templates

Implicit XMLParse
stream<${schema}> ${outputStream} = XMLParse(${inputStream}) {
            param
                trigger : ${triggerExpression};
        }
      

Explicit XMLParse
stream<${schema}> ${outputStream} = XMLParse(${inputStream}) {
            param
                trigger : ${triggerExpression};
            output
                ${outputStream} : ${outputAttribute} = ${value};
        }
      

Metrics

nInvalidTuples - Counter

The number of tuples that failed to convert from XML to an SPL tuple.

Libraries

xml-spl
Library Name: streams-stdtk-xml
Include Path: ../../../impl/include