The Generic annotator allows you to search and analyze
log files for which a specific annotator is not available. There are
two types of annotations created by the Generic annotator. Those are
Concepts and Key-Value pairs. This section outlines the purpose, scope,
and use of the IBM® Operations Analytics - Log Analysis Generic
annotator.
Included concept tokens
A concept is a piece
of text that represents a real world entity such as an IP address
or a hostname. These concepts are useful for searches as they provide
information that can assist you in diagnosing issues. The Generic
annotator includes support for these annotation tokens:
- Hostname
- Names given to devices that connect to a network and that are
referenced in the log record.
- IP Address
- Numeric labels given to devices that are connected to a network
and that are referenced in the log record
- Severity Level
- The indicator of the severity of an event in a log record. The
Generic annotator provides annotation for these severity levels:
- SUCCESS
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- OFF
- CRITICAL
- CRITICAL_ERROR
- SEVERE
- IGNORE
- WARNING
- CONFIG
- FINE
- FINER
- FINEST
- ALL
- URL
- Web addresses listed in the log record.
- Identifier
- Patterns intended to capture names of constants that might repeat
within the log record and that signify the occurrence of some event.
For example, ECH_PING_FAIL_BCKP. The Generic annotator
assumes that an identifier is a sequence of alphanumeric characters
in capitals that may be separated by underscores.
Excluded concept tokens
The Generic annotator
assumes that these tokens are noise and they are ignored:
- Date and time
- For the purposes of analysis date and time are not useful.
- Number
- The Generic annotator ignores both whole and decimal numbers.
- Hexadecimal numbers
- The Generic annotator ignores hexadecimal numbers such as 7AB87F.
- Stop words
- A list of stop words have been defined for the Generic annotator.
This is to allow the Generic annotator to ignore common words that
might appear frequently, but offer no value in an analysis of the
log records.
Key-value pairs
A Key-value annotation extracts
data from a log record if it is in the format
<key> = <value>.
For example,
ERROR-CODE = 4499. These Key-value pairs
can be used to list the values for each key. These limitations apply
to Key-value pair annotations:
- Colon separator
- Key-value pairs that are separated by a colon are excluded. For
example, Label: ECH_PING_FAIL_BCKP.
- Hyphen prefix
- Key-value pairs where the value begin with a hyphen are excluded.
For example, ERRORCODE = -4499.
- Numbers with commas
- Key-value pairs where the value includes a comma are excluded.
For example, ERRORCODE = 4,499.
- Forward and backward slash characters
- Key-value pairs where the value contains a forward or backward
slash are excluded. For example, path = /opt/IBM/.
- Quotes
- Key-value pairs where the value is contained within quotation
marks. For example, language = “English”.
- Delimiter characters
- Some limitations exist where the value in a Key-value pair contains
a delimiter. However, these depend on the whether the value contains
a token that can be annotated based on the list of included tokens.
For example, Time = Thu Nov 22 06:28:48 EST 2012 is
delimited by a space after Thuand therefore the Key-value
pair is assumed to be Key = Time, Value =
Thu. However, a Date and Time annotator can annotate the
full value to give a value of Key = Time, Value
= Thu Nov 22 06:28:48 EST 2012.
Key-value pairs
A Key-value annotation extracts
data from a log record if it is in the format <key>=<value>.
For example, ERROR-CODE = 4499. These Key-value pairs are used to
list the values for each key in the Discovered Patterns section of
the Search UI.
There are two categories of KVP annotations.
The first is Key-value pairs that are separated by an equal sign (=).
The second category is those separated by a colon (:). Each category
has a different set of rules to determine what is a valid annotation.
Key-value
pairs separated by an equal sign, '='
- Both the key and value must be one token
- The key can contain upper and lower case letters, dashes (-),
underscores (_), and periods (.)
- The key must begin with a letter
- The value can contain upper and lower case letters, numbers, dashes
(-), underscores (_), periods (.), at signs (@), and colons (:)
- The value can be surround by matching brackets [ ], parentheses
( ), angle-brackets < >, single quotes ' ', or double quotes “ “
- The value must being with a letter or a number, and may have an
optional dash (-) at the beginning
- A single whitespace character may be on one or both sides of the
equal sign
- The single token rule for the value is disregarded when a concept
is found for the value. For example, if a multi token date is identified
as the value, the whole date, not just the first token, will be annotated.
- Users may add custom regular expressions to the dictionary located
at Log_Analytics_install_dir/unity_content/GA/GAInsightPack_v1.1.1/extractors/ruleset/GA_common/dicts/userSpecifiedStrings.dict.
Matches to these regular expressions will be used when checking if
the value is part of a larger concept.
Key-value pairs separated by a colon, ':'
- Both the key and value must be between 1 and 3 tokens
- The tokens can contain any character except whitespace or colons.
- Tokens must be separated by spaces or tabs
- The colon may have one or more spaces or tabs to the left and
must have at least one space or tab to the right. There may be more
than one space or tab to right of the colon
- The entire string must be on a line by itself