< Previous | Next >

Lesson 4.1: Modifying a rule to handle data correctly

In this lesson, you ensure that all records that match the most common pattern are handled correctly by modifying the rule that handles those records.

Overview

The standardization goals of the fictional Sample Outdoor Company require that the name of some products include the product brand name. For the most common pattern in the data, the company added a rule that maps values in the input records to the correct output columns. However, the ProductName output column does not include the product brand name for any brands. The rule must be modified for some brands to add the brand name to the ProductName output column.

Rules are processes that standardize groups of related records. Rules can apply to records that match the same pattern or to exact strings of text. When you add or modify a rule, you map values in the input records to output columns, specify actions that manipulate the data, and identify conditions to ensure that rules apply only to the correct records.

A rule group is a collection of rules that are applied to records at the same point in the standardization process. To ensure that rules are applied in a particular order, you can organize the rules into rule groups in the Standardization Rules Designer. Then, you can invoke the rule groups from the pattern-action specification (previously called the pattern-action file).

Procedure

  1. Click the Rules tab, select the Input_Overrides rule group, and then click Open.
  2. Expand Pattern Rule. A list of patterns in the data is shown.
    The Pattern Rule twistie is expanded. The first five patterns in the data are shown.
  3. Expand B+SCT, and then expand Copy product data to output columns, which is the only rule for the pattern. The rule maps values in the input records to output columns.

    Note that the frequency of the rule is equal to the frequency of the pattern.

    26.64% of records in the sample data match the pattern B+SCT. The same percentage of records are handled by the rule that is named Copy product data to output columns.
    This information indicates that the rule applies to all of the records that match this pattern.

  4. If the example record on the Define Rule page is not ANTONI BELLA JUNIOR BLUE EYEWEAR, select ANTONI BELLA JUNIOR BLUE EYEWEAR from the list of example records. If ANTONI BELLA JUNIOR BLUE EYEWEAR is not in the list of example records, increase the number of records in the list.
    1. In the upper-right corner of the Browse Rules pane, click Example Record Settings button.
    2. From the Records to Display list, select 100.
    3. Click OK. The list of example records includes ANTONI BELLA JUNIOR BLUE EYEWEAR. You can select the record from the list.
  5. From the example record, drag the value ANTONI, which is a product brand name, to the ProductName output column. To ensure that the product brand name is included before the product name in the output column, you might need to drag the BELLA product name after the product brand name.

    The product brand name is in the ProductName output column and is separated from the rest of the product name by a space.

    The ProductName output column contains the product brand name, and the product name. The values are separated by a space.
  6. In the ProductName output column, right-click the ANTONI value and click Edit Action. In the Edit Action window, you can manipulate the data that is sent to the output column.
  7. In the Look up the Object section, click Yes.
  8. Specify a list of the product brand names to add to the ProductName output column:
    1. From the Source list, select List.
    2. In the List table, enter the product brand names that must be added to the product name. The following brands include the product brand name in the product name:
      • EPOCH
      • FIREFLY
      • HAILSTORM
      • ANTONI

      The table lists the required brand names.

      The list table shows the values EPOCH, FIREFLY, HAILSTORM, and ANTONI. For all of the values, the returned value is the same as the value, and the similarity threshold is 900.
    3. From the If Found list, select Convert to Returned Value. If an input record contains a product brand name that is in the list, the product brand name is added to the ProductName output column.
    4. From the If Not Found list, select Do Nothing (Stop Action). If you select this option, any product brand names that are not in the list will not be added to the ProductName output column.
    5. Click OK.
  9. Click Apply.
< Previous | Next >