IBM InfoSphere Streams Version 4.1.1

Working with governance catalogs in Streams Studio

You work with governance catalogs in Streams Studio to provide information governance for your streams processing applications.

About this task

Information governance is an approach to managing, improving, and using information to increase an organization's confidence in its decisions and operational business processes. IBM® InfoSphere® Information Governance Catalog provides the entry point for an organization to understand and govern its information. InfoSphere Streams supports IBM InfoSphere Information Governance Catalog Version 11.3 and Version 11.5.

Assets in a governance catalog include terms, categories, information governance policies, and information governance rules. Before you can work with assets in your streams processing application, you must import the InfoSphere Streams custom asset types into the InfoSphere Information Governance Catalog. Then, add the governance catalog to your Streams Studio environment to make supported asset types available in the palette of the Graphical Editor for use in your streams processing application. For details, see the topic about adding catalogs. Hover over an asset to view its properties.

When you drop an asset onto the canvas of the Graphical Editor, a corresponding SPL source operator is added to the application. The resulting source operator is configured with the properties of the asset. If you drop the asset onto an existing operator, that operator is updated with the information from the governance catalog.

You can also add InfoSphere Streams assets to a governance catalog so that data lineage and flow metadata can be reported when an application using these assets is submitted. To add an asset, right-click a Sink operator in your project in the Graphical Editor, select Governance Catalog > Create asset in catalog, and select a catalog. You can select these Sink operators for registering an asset:
  • HDFS2FileSink
  • JMSSink
  • KafkaProducer
  • MQTTSink
In some streams processing applications, detailed configuration for an operator might not be available until run time. If you use such an operator to create an asset in the catalog, you can use placeholder metadata to create the asset. Then update the asset details with proper information using the InfoSphere Information Governance Catalog user interface.
Note: You must have the appropriate authority and access to the catalog to create assets. For details about the required security roles, see the product documentation for InfoSphere Information Governance Catalog.

The streams processing applications are governed at the instance level. All jobs within a governed instance report lineage and flow metadata to the catalog. By default, a new instance is not governed. You can configure an instance to be governed by setting the domain.governanceEnabled and domain.governanceUrl instance properties or the instance.governanceEnabled and instance.governanceUrl instance properties. Use the streamtool setigcadminconfig command to set the user name and password that the InfoSphere Streams instance will use to access the governance catalog.