InfoSphere DataStage tasks

You use InfoSphere® DataStage® to develop jobs, which process and transform your data. You can administer, manage, deploy, and reuse these jobs to integrate data across many systems throughout your organization.

Your organization can use InfoSphere DataStage to complete the following tasks:

Process and transform large volumes of data
By handling the collection, integration, and transformation of large volumes of data, your organization can linearly scale the speed of data throughput. A scalable platform that includes parallel processing and incorporates flexible, reusable functions enables users to design logic once, and then run and scale that logic anywhere.

By using parallel processing capabilities of multiprocessor hardware platforms, you can scale transformation jobs to address any demands, large or small. During development, the deployment configuration automatically adds the degree of parallelism that you specify. By making a simple change to the configuration file, you can change your application from 2-way processing to 32-way processing to 128-way processing.

Design reusable transformation jobs
Reusable transformation functions enable data integration specialists to maximize speed, flexibility, and effectiveness in their designs.

Data integration specialists use the rich user interface for all design work, including workflow, data integration, and data quality. Prebuilt transformation functions can dragged to a design, making it easy to determine the flow of information and the transformations that occur. Any portion of the design can be shared and reused across the data integration landscape, maximizing reuse and productivity.

Extend connectivity to various objects
By using common connectors, any data source that is supported by InfoSphere Information Server can be used as input to or output from InfoSphere DataStage, enabling your organization to integrate data effectively across the enterprise.

A nearly unlimited number of heterogeneous data sources and targets are supported, including text files, complex data structures in XML, enterprise resource planning (ERP) systems such as SAP and PeopleSoft, nearly any database, web services, and business intelligence (BI) tools like SAS.

Manage operations and resources
By operating in real time, your organization can capture messages or extract data at any moment on the same platform that integrates bulk data and uses transformation rules. This integration ensures that data can be used to respond to your data integration needs on demand.

Real-time data integration support captures messages from Message Oriented Middleware (MOM) queues using JMS or WebSphere® MQ adapters to combine data into operational and historical analysis perspectives. By using InfoSphere DataStage with InfoSphere Information Services Director, data integration jobs can be deployed with Java™ Message Services, web services, or other services. This service-oriented architecture (SOA) enables numerous developers to share complex data integration processes without having to understand the steps contained in the services.

You can use the InfoSphere DataStage Operations Console to access information about your jobs, job activity, and system resources for each of your InfoSphere Information Server engines. The Operations Console is useful for troubleshooting failed job runs, improving job run performance, and actively monitoring your engines.