Analyzing data by using data rules

You can define and run data rules, which evaluate or validate specific conditions associated with your data sources. Data rules can be used to extend your data profiling analysis, to test and evaluate data quality, or to improve your understanding of data integration requirements.

To work with data rules, start by going to the Develop icon on the Navigator menu in the console, and select Data Quality. This will get you to the starting point of creating and working with data rule functionality.

From the Data Quality workspace you can:

Characteristics of data rules

You can use data rules to evaluate and analyze conditions found during data profiling, to conduct a data quality assessment, to provide more information to a data integration effort, or to establish a framework for validating and measuring data quality over time.

You can construct data rules in a generic fashion through the use of rule definitions. These definitions describe the rule evaluation or condition. By associating physical data sources to the definition, a data rule can be run to return analysis statistics and detail results. The process of creating a data rule definition and generating the subsequent data rule is shown in the following figure:
Figure 1. Process of creating and running a data rule definition
The process of creating a data rule definition, generating a data rule, running a data rule, and viewing the output
IBM InfoSphere Information Analyzer data rules include the following characteristics:
Reusable
The definitions are not explicit to one data source, but can be used and applied to many data sources.
Quickly evaluated
They can be tested interactively, as they are being created, to ensure that they deliver expected information.
Produce flexible output
Data rules can produce a variety of statistics and results, at both the summary and detail levels. You can evaluate data either positively (how many records meet your condition) or negatively (how many records violate your condition), and control the specifics of what you need to view to understand specific issues.
Historical
They capture and retain execution results over time, allowing you to view, monitor, and annotate trends.
Managed
Each data rule has a defined state, such as draft or accepted, so that you can identify the status of each rule.
Categorical
You can organize data rules within relevant categories and folders.
Deployable
You can transfer data rules to another environment. For example, you can export and transfer a data rule to a production environment.
Audit functionality
You can identify specific events associated with a rule such as who modified a rule and the date that it was last modified.