In this tutorial, you will use the IBM® InfoSphere® QualityStage® Standardization Rules Designer to
enhance a rule set that standardizes product data. When you standardize
data, you implement data quality standards that normalize data values
and prepare data for uses such as matching and reporting.
In this tutorial, you will use data from the fictional
Sample Outdoor Company, which sells and distributes products to third-party
retailer stores and consumers. For the last several years, the company
has steadily grown into a worldwide operation, selling their line
of products to retailers in nearly every part of the world.
The
fictional Sample Outdoor Company recently acquired several new product
lines. The company wants to integrate the data for these product lines
into its current database, but the new data contains new types of
information and is formatted inconsistently. The Sample Outdoor Company
can use the IBM InfoSphere QualityStage Standardization Rules Designer to
enhance the rule set that standardizes this type of data.
After
the rule set is enhanced, the company will apply the rule set to a
Standardize stage in a standardization job. When the standardization
job is run, input data is standardized according to the logic that
is specified in the enhanced rule set.
This tutorial guides
you through some of the common tasks that you might complete when
you enhance a rule set in the
Standardization Rules Designer.
The following steps illustrate the sequence of actions in the tutorial:
- In module 1, you identify a rule set that requires enhancement
and open a revision for that rule set in the Standardization Rules Designer.
You also import sample data to use in the Standardization Rules Designer.
- In module 2, you categorize the parts of the data. You add classification
definitions that assign new values to existing classes and add a custom
class for a new type of data. Figure 1 shows how each value in an
example record for a fictional Sample Outdoor Company product can
be assigned to a class.
- In module 3, you add a lookup table that converts alphabetic information
about product colors to numeric color codes. Figure 2 shows part of the lookup
table that is added in the Standardization Rules Designer.
- In the first lesson of module 4, you modify a rule that was added
in the Standardization Rules Designer.
The rule is handling data incorrectly for some of the new product
brands. Figure 3 shows the output with the
current rule and the output after the rule is modified according to
the data cleansing requirements of the fictional Sample Outdoor Company.
- In the second and third lessons of module 4, you identify the
most common pattern that is unhandled and add a rule to handle data
that matches the pattern. Figure 4 shows how an example record
for a fictional Sample Outdoor Company product is handled by the new
rule.
- In the fourth lesson of module 4, you create a rule that handles
two distinct values that are concatenated in the input data. For example,
if the input data contains the value 195cm,
you can create a rule that splits the value into the values 195 and cm and
places them in the appropriate output columns. Figure 5 shows how an example record
for a fictional Sample Outdoor Company product is handled by this
rule.
Learning objectives
By completing
the modules, you will learn about the concepts and tasks for enhancing
rule sets:
- Import sample data to see how records from the sample data are
affected by changes to parts of the rule set
- Use classifications to categorize parts of your data
- Add lookup tables to compare or convert data to specified values
- Create rules that apply actions to a group of related records
Time required
Before you begin the tutorial,
you must set up your environment. The time that is required for setup
depends on your current environment.
The modules each take 20 -
60 minutes to complete.
System requirements
The following
components and applications must be installed on your system.
- IBM InfoSphere QualityStage Standardization Rules Designer
- IBM InfoSphere QualityStage with
the following clients:
- IBM InfoSphere DataStage® and QualityStage Designer
- IBM InfoSphere DataStage and QualityStage Administrator
Prerequisites
Before you begin this
tutorial, you must understand data quality concepts such as standardization,
classification, and rules for data cleansing. Knowledge about InfoSphere DataStage and QualityStage concepts
such as jobs, stages, and reports might be helpful, but is not required.
Notices: The Sample Outdoor Company, GO Sales,
any variation of the Great Outdoors name, and Planning Sample, depict
fictitious business operations with sample data used to develop sample
applications for IBM and IBM customers. These fictitious
records include sample data for sales transactions, product distribution,
finance, and human resources. Any resemblance to actual names, addresses,
contact numbers, or transaction values, is coincidental. Other sample
files may contain fictional data manually or machine generated, factual
data compiled from academic or public sources, or data used with permission
of the copyright holder, for use as sample data to develop sample
applications. Product names referenced may be the trademarks of their
respective owners. Unauthorized duplication is prohibited.