Indexing configuration

To control how IBM® Operations Analytics - Log Analysis indexes records from a log file, you can create indexing settings for your content Insight Pack.

The indexing configuration settings specify the data type for each field that is indexed. The settings also specify a set of indexing attributes for each field. The index processing engine uses these attributes to define how a field is processed.

One configuration is defined for each Source Type that is contained in an Insight Pack. For more information about Source Types, see the topic about Source Types in the IBM Operations Analytics - Log Analysis Administration Guide.

The index configuration settings are defined in the Java™ Script Object Notation (JSON) format. To edit the index configuration settings, use the Eclipse based tooling that is provided with IBM Operations Analytics - Log Analysis. For more information about how to edit the index configuration settings, see Editing the index configuration.

The indexing configuration specification consists of the following attributes:
  • indexConfigMeta contains some basic metadata information about the indexing configuration itself. This information includes the following attributes:
    • name specifies the name of the indexing configuration. For example, WAS SystemOut Config.
    • Description specifies the description of the indexing configuration. For example, WAS SystemOut indexing config.
    • version specifies version of the indexing configuration. For example, 1.0.
    • lastModified specifies the last modified date. For example, 01/11/2013.
  • Fields are used to define field descriptions for the each record to be indexed. IBM Operations Analytics - Log Analysis uses the following field descriptions to define the data for each field that is indexed:
    • fieldname specifies the name of field to be indexed
    • dataType specifies the data type of field to be indexed. This can be TEXT, LONG, DOUBLE , and DATE.
    • indexingattributes are five attributes that contain binary values. IBM Operations Analytics - Log Analysis uses the five attributes to indicate how the field is processed. The five attributes are:
      • retrievable
      • retrieveByDefault
      • sortable
      • filterable
      • searchable
    For more information about field configuration, see Field configuration
IBM Operations Analytics - Log Analysis also uses an attribute that is called Source during indexing. The Source attribute is structured as follows:
indexConfigMeta
timeZone
fields:
   <field name>
   <data type>
   <list of indexing attributes such as sortable, searchable.>

“source”: {
		“paths”: [json_path1, json_path2, …., json_pathN],
    “dateFormats”: [date_format1, date_format2],
		“combine”: “one of two possible values – ALL or FIRST” 
	    }
The Source attribute consists of three other attributes:
paths

The paths attribute contains an array of one or more JSON path expressions.

dateFormats

The dateFormats attribute is only relevant for fields that use the DATE type. It is used to specify format strings that determine how date values that are entered in this field are parsed.

Attention: The number of elements in the array must be the same for both the paths and dateFormats attributes.
combine

The combine determines how the values that are returned by the paths and dateFormats attributes are used. The combine attribute has two possible values, ALL or FIRST. ALL is the default value.

If combine is set to ALL, all the non-null values from all the paths are added to the content of the corresponding field. This setting allows an index field to be populated from multiple attributes in the JSON record that you specify.

For example, consider a scenario where you want to index all the host names that are associated with each record into a single indexed field. The host names can be part of the structured metadata that belongs to an incoming log record or they can be extracted by analytics from a log message. For example, IBM Operations Analytics - Log Analysis generates the following JSON structure after the annotation is complete:
{
	“logRecordID”: “3344564533”,
	“hostname”: “host1.ibm.com”,
	“message”: “Server failed to ping host2.ibm.com and host3.ibm.com”,
	“Annotations”: {
			   “hosts”: [{“name”: “host2.ibm.com”, “begin”: 22, “end”: 35},
				      {“name”: “host3.ibm.com”, “begin”:40, “end”:53}
		                              ]
			}
}
To ensure that the value for the field that is indexed includes both of the host names that are related to the annotated record, you use the following source attribute definition in the indexing configuration:
	“source”: {
			“paths”: [“hostname”, “Annotations.hosts.name”],
			“combine”: “ALL”
  }

If combine is set to FIRST, the JSON path expressions are evaluated individually in the order that they are listed in the array. The first path expression that returns a non-null and non-empty string value is used and the subsequent expressions are ignored. If the first path expression that returns a non-null and non-empty string value returns multiple values, IBM Operations Analytics - Log Analysis uses all the values to populate the indexed fields.

For example, imagine that you want to index a field that stores the host names that are included in the log message. However, IBM Operations Analytics - Log Analysis cannot extract the host name from some log records. In this case, you want to use the host name that is associated with the overall log record as a substitute. You use the following source attribute to do this:
“source”: {
			“paths”: [ “Annotations.hosts.name”, “hostname”],
			“combine”: “FIRST”

Example

The following example shows an abbreviated example of the indexing configuration for WebSphere® Insight Pack:

    { "indexConfigMeta" : 
{ "description" : "Index Mapping Configuration for WAS SystemOut logs",
        "lastModified" : "11/01/2013",
        "name" : "WAS SystemOut Config",
        "version" : "0.4"
      },

    "timeZone" : "UTC",

    "fields" : { 
      "className" : { "dataType" : "TEXT",
          "filterable" : true,
          "retrievable" : true,
          "retrieveByDefault" : true,
          "searchable" : true,
          "sortable" : false,
          "source" : { "paths" : 
[ "annotations.annotatorCommon_ClassnameOutput.span.text" ] },
          "tokenizer" : "literal"
        },
     "timestamp" : { "dataType" : "DATE",
          "filterable" : true,
          "retrievable" : true,
          "retrieveByDefault" : true,
          "searchable" : true,
          "sortable" : true,
          "source" : { "combine" : "FIRST",
              "dateFormats" : [ "MM/dd/yy HH:mm:ss:SSS Z",
                  "MM/dd/yy HH:mm:ss:SSS Z"
                ],
              "paths" : [ "annotations.annotatorCommon_LogTimestamp.span.text",
                  "metadata.timestamp"
                ]
            },
          "tokenizer" : "literal"
        }
    }
  }