Characteristics of good sample data

In the Standardization Rules Designer, you work with records from sample data when you add or modify rules and classifications. The sample data that you choose informs the changes that you make in the Standardization Rules Designer and the standardization processes that result.

When you choose sample data to work with in the Standardization Rules Designer, ensure that the data has the following characteristics:

Remember that the Standardization Rules Designer processes every value in a record. When you prepare sample data, ensure that the sample records contain only the values that records in your actual data will contain at a particular point in the standardization process.

Suppose that you want to add rules in the Standardization Rules Designer that are applied to records after all other actions in the pattern-action specification (previously called pattern-action file). You can note how the records are changed by previous actions in the pattern-action specification and prepare your sample data accordingly. For example, previous actions in the pattern-action specification might remove values from further processing by assigning those values to the NULL class. When you prepare the sample data, you remove the values that are assigned to the NULL class.

After you import a sample data set, verify in the Standardization Rules Designer that the sample data set matches the format and content that you require. You can browse the patterns in the data and ensure that the distribution of patterns matches your expectations for the data set. For example, if the most common pattern in the sample data set is a pattern that you know is relatively rare in your actual data, the sample data set might not represent your actual data adequately.