IBM InfoSphere Streams Version 4.1.1

Toolkit com.ibm.streams.text 2.1.1

SPL standard and specialized toolkits > com.ibm.streams.text 2.1.1

General Information

The Text Toolkit includes the TextExtract operator, which extracts information from text data.

The Text Toolkit integrates the Text Analytics component of IBM InfoSphere BigInsights Version 4.1.0.0, which provides a system for extracting information from text data.

By using the Text Toolkit, a streams processing application can read text data and derive structured information that is based on various rules. These rules are defined in extractors, which are programs that extract information from within a text field. Extractors can be written in AQL manually or they can be created using the Information Extraction Web Tool in BigInsights versions 4.0 and up. After creating an extractor in the web tool, you can generate the AQL for the extractor for use in Streams by using the "Export AQL" feature in the web tool. The web tool can be launched from the BigInsights home page by clicking Text Analytics. The product of extractors is a set of annotated text that includes specific information that is important to your business. By using the TextExtract operator in your application, you can output this information as tuples on a data stream. BigInsights Text Analytics includes a set of extractors that extract mentions of general information such as names, e-mail addresses, currency, and other general data from input text. These pre-built extractor libraries can be used on their own or within custom extractors.

The unit of compilation in the Text Toolkit is a module, which is one or more AQL files in a directory. Modules can have input that is specified at run time in the form of dictionaries and tables. When a module is compiled, the result is a TAM file.

NOTE: The sentiment extractors are not supported on IBM Power Systems that run little endian.

Additional information

BigInsights Text Analytics documentation

Toolkit structure
The Text Toolkit provides tools that help you process unstructured text.
Developing and running applications that use the Text Toolkit
To create applications that use the Text Toolkit, you must configure either Streams Studio or the SPL compiler to be aware of the location of the toolkit.
createTypes script
The createTypes.pl script can be used to create applications from modules that contain Annotated Query Language (AQL) files. For example, after creating extractors in the Information Extraction Web Tool in BigInsights, the script could be used to generate an SPL application that utilizes the extractors created in the web tool. The generated streams processing applications can be used as a starting point for developing applications and the generated types might make it easier to maintain SPL applications that involve text analytics.
Version
2.1.1
Required Product Version
4.0.0.0