IBM InfoSphere Streams Version 4.1.0

Use cases and application types

Telecommunications companies often have a similar set of requirements and challenges that can be solved with InfoSphere® Streams applications that use the Telecommunications Event Data Analytics application framework.

Often, these companies must find a way to handle the following tasks:
  • Ingest and process data from telecommunications networks (metadata, message contents)
  • Enrich network data with customer data, for example, the subscriber type
  • Store records in a data warehouse for downstream applications and archival purposes
  • Filter and detect specific events, for example, dropped calls
  • Provide usage data and KPIs, for example, per subscriber, per cell, and per service
Some of their typical use cases are:
  • Revenue assurance and business intelligence
  • Campaign management
  • User experience, user behavior & statistics
  • Network and services usage
  • Fraud detection
The primary challenges that these companies face are:
  • Dealing with different network elements (types and vendors) and data formats
  • Achieving high reliability without data loss and consistent end-to-end processing
Their typical processing steps and non-functional requirements are:
  • Reading, parsing, and decoding ASN.1, binary, and ASCII data records from various network elements
  • Dealing with multiple input directories, file types, and priorities
  • De-duplicating input data at the file and data level, for example, one week worth of data
  • Looking up data from large static tables to enrich network data
  • Grouping processing for load sharing
  • Checkpointing intermediate data and status regularly for recovery
  • Loading data into databases with high throughput
  • Scaling applications flexibly
  • Implementing custom business logic into reliable and proven framework application

Application Types

The following examples outline the typical applications, their requirements, and the features that the application framework provides or supports.

Extract-Transform-Load (ETL) applications

ETL applications extract data from multiple data sources in multiple formats (CSV, ASN.1, or other binary formats), transform the data for storage in a proper and unified format or structure, and load it into a database or data warehouse.

Typical transformation types are listed in the following table. The types that are built-in or supported by the application framework are marked with an asterisk (*):
  • Selecting a subset of records (filtering) or of the available data (reduction) (*)
  • Applying any form of simple or complex validation (*)
  • Translating values, for example, 1 to male or male to M (*)
  • Deriving new calculated values (*)
  • Sorting records
  • Joining data from multiple sources
  • Deduplicating data (*)
  • Aggregating data (*)
  • Generating surrogate keys (*)
  • Turning multiple columns into multiple rows (*)
  • Turning multiple rows into multiple columns
  • Splitting a column into multiple columns (*)
  • Disaggregating repeating data
  • Looking up relevant or referential data for slowly changing dimensions (*)

The applications that use the application framework load the data into simple flat CSV files that can be loaded into databases or data warehouses using other InfoSphere Streams toolkits or the DbLoader application on GitHub in the IBMStreams/samples directory.

Related link: Extract, transform, load

Campaign Management applications

Campaign management systems target customer groups according to selected criteria. The system sends campaign-related materials, such as special offers, to targeted customers. For example, a telecommunications company has a customer who experienced five dropped calls a day. This customer receives a discount or free minutes to prevent contract cancellation. Or, another customer only calls five numbers receives an offer to pay, for example, a small fee but gets a discount for each call to these five numbers.

Typical KPIs are:
  • Number of dropped calls within a time period
  • Voice minutes within a time period
  • Voucher recharges within a time period

The source systems for such a system can be mobile network elements, such as a mobile switching center (MSC) that stores call details records (CDRs).

An InfoSphere Streams application that uses the application framework extracts and transforms the input data, calculates the required KPIs in near real-time, and sends events in case of crossed thresholds to another system that is responsible for providing the appropriate offer.