You can define and run data rules, which evaluate or validate
specific conditions associated with your data sources. Data rules
can be used to extend your data profiling analysis, to test and evaluate
data quality, or to improve your understanding of data integration
requirements.
To work with data rules, start by going to the Develop icon
on the Navigator menu in the console, and select Data Quality.
This will get you to the starting point of creating and working with
data rule functionality.
From the
Data Quality workspace you can:
- Create data rule definitions, rule set definitions, data rules,
rule sets, and metrics
- Build data rule definition, rule set definition, and metric logic
- Create data rule definition and rule set definition associations
- Associate a data rule definition, rule set definition, metric,
data rule, or rule set with folders
- Associate a data rule definition, rule set definition, metric,
data rule, or rule set with IBM® InfoSphere® Business Glossary terms,
policies, and contacts
- Build data rule definitions or rule set definitions by using the
rule builder
- Add a data rule definition with the free form editor
Characteristics of data rules
You can use
data rules to evaluate and analyze conditions found during data profiling,
to conduct a data quality assessment, to provide more information
to a data integration effort, or to establish a framework for validating
and measuring data quality over time.
You can construct data
rules in a generic fashion through the use of rule definitions. These
definitions describe the rule evaluation or condition. By associating
physical data sources to the definition, a data rule can be run to
return analysis statistics and detail results. The process of creating
a data rule definition and generating the subsequent data rule is
shown in the following figure:
Figure 1. Process of creating and
running a data rule definition
IBM InfoSphere Information Analyzer data rules include the following characteristics:
- Reusable
- The definitions are not explicit to one data source, but can be
used and applied to many data sources.
- Quickly evaluated
- They can be tested interactively, as they are being created,
to ensure that they deliver expected information.
- Produce flexible output
- Data rules can produce a variety of statistics and results, at
both the summary and detail levels. You can evaluate data either positively
(how many records meet your condition) or negatively (how many records
violate your condition), and control the specifics of what you need
to view to understand specific issues.
- Historical
- They capture and retain execution results over time, allowing
you to view, monitor, and annotate trends.
- Managed
- Each data rule has a defined state, such as draft or accepted,
so that you can identify the status of each rule.
- Categorical
- You can organize data rules within relevant categories and folders.
- Deployable
- You can transfer data rules to another environment. For example,
you can export and transfer a data rule to a production environment.
- Audit functionality
- You can identify specific events associated with a rule such
as who modified a rule and the date that it was last modified.