Extracting Open Hub data by using InfoSpokes

This section describes how to define the Open Hub Extract stage properties, define the communication between SAP BW and IBM® InfoSphere® DataStage®, define an Open Hub Extract job, perform an extraction, and verify an extraction.

Using the InfoSphere DataStage and QualityStage® Designer client, the Open Hub Extract stage:

  • extracts data and metadata from the SAP BW system.
  • allows for the initiation of data extraction from either SAP BW or InfoSphere DataStage.
  • uses certified APIs to access extracted metadata and data in the SAP BW system.
  • allows the extraction of any type of data supported by the Open Hub architecture.
  • provides checking of SAP BW Process Chain status and logging of unsuccessful operations.
  • supports server and parallel jobs.

Until version 4.3.3, stage run time is not supported in parallel execution mode. Therefore, irrespective of the execution mode (whether the stage is designed with parallel or sequential execution mode), run time will always run in sequential mode. Starting from version 4.3.3.1, Open Hub extract stage also supports parallel execution mode and thus you can benefit from DataStage parallel processing engine by running the stage over multiple nodes for better performance and scalability.

The stage has a custom GUI that lets you specify a Source System, an InfoSpoke, and a Process Chain from the SAP BW system, and to display the corresponding Transfer Structure for the InfoSpoke.

The Open Hub stage also has a runtime stage and an RFC Server to extract data from SAP BW. (A separate instance of the RFC Server is constantly running for each Source System supported on an InfoSphere DataStage system.) Additionally, it has an RFC Server Manager that starts and stops the individual RFC Server instances, depending on which Source Systems are supported.