Pattern matching criteria

When you run a pattern matching search, you specify to method to evaluate possible matches and the margin of error between the search pattern and matching values.

Search method

The search method that you choose depends on whether the pattern that you are looking for occurs over a logical time interval or occurs at any time:

Whole pattern match search
Evaluates consecutive sequences of values that do not overlap. Sequences start at the origin of the time series. A whole pattern match search is useful if you are searching for a pattern that is associated with a logical time interval, such as a day or a week. For example, if the origin of your time series starts at the time 00:00:00, you can search for a daily electricity usage pattern that starts at 00:00:00.
Subsequence pattern match
Evaluates every possible overlapping subsequence of values. A subsequence pattern match search is useful if you are searching for a pattern that can happen at any time. For example, if you are looking for a pattern that indicates an electrical outage, you want to evaluate possible matches that start with every timepoint.

For example, suppose that you want to search for the pattern (55),(55),(55),(55) in the following set of values:

(1),(1),(55),(55),(55),(55),(1),(45),(45),(45),(45),(1)

A whole pattern match search evaluates each consecutive sequence of four timepoints:

(1),(1),(55),(55)
                  (55),(55),(1),(45) 
                                   (45),(45),(45),(1)

A subsequence pattern match search evaluates every possible subsequence of four timepoints:

(1),(1),(55),(55)
    (1),(55),(55),(55)
        (55),(55),(55),(55)
             (55),(55),(55),(1)
                  (55),(55),(1),(45)
                       (55),(1),(45),(45)
                            (1),(45),(45),(45)
                                (45),(45),(45),(45)
                                      (45),(45),(45),(1)                

Margin of error

You control the margin of error for pattern matches by specifying how closely possible matches must be to the search pattern.

Unit error
An absolute value that represents the limit of how much each matching value can differ from the corresponding value in the search pattern. A possible match is evaluated by comparing the Euclidean distance between the target sequence and the search pattern sequence to a factor of the unit error.

The following equation shows how to compute the Euclidean distance for a target sequence T[1...L] and the query pattern sequence Q[1...L].

Figure 1. Euclidean distance
The Euclidean distance of (T,Q) is equal to the square root of the sum of the squares of the differences between the values of the sequences.

The target sequence satisfies the Euclidean distance condition if the square of the Euclidean distance of the two sequences is less than or equal to the number of values in the sequence, L, times the square of the unit error, u:

Euclid_Dist(T,Q)2L * u2
Similarity threshold
A double precision number 0.0-1.0 that represents the percentage of values that must be within the unit error to be a match. For example, if the similarity threshold is 0.50, then at least half the values of a match must be within the unit error.

A pattern is a match if it satisfies both the Euclidean distance condition and the similarity threshold.

For example, suppose that you want to search for the same pattern, (55),(55),(55),(55), in the following same sequence as the previous examples:

(1),(1),(55),(55),(55),(55),(1),(45),(45),(45),(45),(1)

If the unit error is 0.5 and the similarity threshold is 0.9 (90% of the values must be within the unit error), then a subsequence pattern match search identifies the following match:

(55),(55),(55),(55)

A whole pattern match does not identify any matches because none of the consecutive sequences contain enough matching values.

If the unit error is 0.5 and the similarity threshold is 0.5, then a subsequence pattern match search identifies the following matches:

(1),(1),(55),(55)
    (1),(55),(55),(55)
        (55),(55),(55),(55)
             (55),(55),(55),(1)
                  (55),(55),(1),(45)

A whole pattern match search identifies the following matches:

(1),(1),(55),(55)
                 (55),(55),(1),(45) 

If the unit error is 11.0 and the similarity threshold is 0.75, then a subsequence pattern match search identifies the following matches:

    (1),(55),(55),(55)
        (55),(55),(55),(55)
             (55),(55),(55),(1)
                  (55),(55),(1),(45)
                       (55),(1),(45),(45)
                            (1),(45),(45),(45)
                                (45),(45),(45),(45)
                                      (45),(45),(45),(1)                  

A whole pattern match search identifies the following matches:

                 (55),(55),(1),(45) 
                                   (45),(45),(45),(1)

Copyright© 2018 HCL Technologies Limited