Custom annotations and splitters
To control how the system processes incoming log file records, you can define custom annotations and splitters for your Insight Pack.
Before IBM® Operations Analytics - Log Analysis indexes any data, it can split and annotate the incoming log file records. You can use either the Annotation Query Language (AQL) rules or custom logic implemented using technologies such as Java™ or Python.
Splitting
Splitting describes how IBM Operations Analytics - Log Analysis separates physical log file records into logical records using a logical boundary such as time stamp or a new line. For example, when a timestamp is used as the logical boundary, all records after the beginning of the first detected timestamp are included in the logical record. The beginning of the next timestamp is used to end the logical record and to start the next logical record.
The logic used by a splitter to determine how to manage incoming data records must adhere to a schema that is required by IBM Operations Analytics - Log Analysis. This is true for both AQL and custom logic splitters. Splitter logic is used to process batches of records when a complete set of logical log records might not be included in a record batch. The splitter must process partial records that can occur at the start of the batch as well as at the end of the batch.
A splitter must distinguish between incoming data records that form a complete log record from records that it must buffer to be marked as complete when additional records are added. It also must identify records that can be discarded, for example, records that the splitter determines are not going to be part of complete log records. The splitter logic can process a batch of incoming records and must split them on the defined boundary. It returns split records with a type that indicates to IBM Operations Analytics - Log Analysis how each record is handled.
- Log text
- The text that is contained in the log record after it is split.
- Timestamp
- The timestamp, if there is one, that is associated with the log record.
- Type
- The type is a single character, A, B, or C, that indicates the
type of this log record. The possible types are as follows:
- A: indicates a complete log record. The splitter logic determines
that the associated record is complete. The record can be sent to
the annotation and indexing processes. For example, in this example,
the first record is a type A record and the second is of type B. This
is because the second record indicates to the splitter that the first
record is complete:
[9/21/12 14:31:13:117 GMT+05:30] 0000003e InternalGener I DSRA8203I: Database product name : D2/LINUXX8664 [9/21/12 14:31:13:119 GMT+05:30] 0000003e InternalGener I DSRA8204I: Database product version : SQL09070
- B: indicates that there is a partial log record at the end of
the set. For example, the splitter detects the start of a new logical
record but cannot determine if it is complete because the splitter
cannot find the next logical record boundary that indicates the start
of the next record. The splitter marks the record as type B to indicate
to the IBM Operations Analytics - Log Analysis server
that this record is a partial record and it must be buffered until
more incoming records are received to allow it to complete the logical
record. The IBM Operations Analytics - Log Analysis server
sends all type A log records for annotation and indexing. It buffers
type B records. The buffered type B records are then prefixed to
the next batch of input that is sent to the splitter when it receives
more input records. For example:
[9/21/12 14:31:27:882 GMT+05:30] 00000051 servlet E com.ibm.ws.webcontainer.servlet.ServletWrapper service SRVE0068E: Uncaught exception created in one of the service methods of the servlet TradeAppServlet in application DayTrader2-EE5. Exception created : javax.servlet.ServletException: TradeServletAction.doLogout (...)exception logging out user uid:1 at org.apache.geronimo.samples.daytrader.web .TradeServletAction.doLogout(TradeServletAction.java:458) at org.apache.geronimo.samples.daytrader.web .TradeAppServlet.performTask(TradeAppServlet.java:169) at org.apache.geronimo.samples.daytrader .web.TradeAppServlet.doGet(TradeAppServlet.java:78)
- C: indicates that the text can be discarded. The IBM Operations Analytics - Log Analysis server
discards this text. This type of record is not sent for annotation
and indexing. It is not buffered. You must define the splitter so
that it only marks text as type C if it is certain that it is not
part of a log record that is not complete. For example, a partial
log record is detected at the beginning of a batch of records. Then,
a complete but unrelated logical log record is found. IBM Operations Analytics - Log Analysis can
never complete the partial record that was detected first. The record
must be marked as type C and discarded. For example:
************ Start Display Current Environment ************ WebSphere Platform 7.0.0.0 [ND 7.0.0.0 r0835.03] running with process name cldftp48Node01Cell\cldftp48Node01\server1 and process id 28811 Host Operating System is Linux, version 2.6.18-194.el5 Java version = 1.6.0, Java Compiler = j9jit24, Java VM name = IBM J9 VM
- A: indicates a complete log record. The splitter logic determines
that the associated record is complete. The record can be sent to
the annotation and indexing processes. For example, in this example,
the first record is a type A record and the second is of type B. This
is because the second record indicates to the splitter that the first
record is complete:
Annotating
After the log records are split, the logical records are sent to the annotation engine. The engine uses rules that are written in AQL or custom logic that is written in Java or Python to extract important pieces of information that are sent to the indexing engine. IBM Operations Analytics - Log Analysis represents the results from the annotation process in a Java Script Object Notation (JSON) data structure called annotations. The annotations JSON structure is part of a larger structure which also contains the original log record text (the content key) and the metadata passed into the REST API (the metadata key). You can reference the annotations structure to access the actual values from the annotation result.
For more information, see the example.
You can reference the annotation results in the source.paths
attributes
that are contained in the field definitions in the indexing configuration.
You use dot notation to indicate where the values of the fields that
are indexed are located in the annotations structure.
{ "annotations" : { "annotatorCommon_EventTypeOutput" :
[ { "field_type" :
"EventTypeWS",
"span" : { "begin" : 57,
"end" : 58,
"text" : "E"
},
"text" : "E"
} ],
"annotatorCommon_LogTimestamp" :
[ { "span" :
{ "begin" : 1,
"end" : 32,
"text" : "03/24/13 07:16:28:103 GMT+05:30"
} } ],
"annotatorCommon_MsgIdOutput" :
[ { "field_type" :
"MsgId",
"span" :
{ "begin" : 59,
"end" : 68,
"text" : "DSRA1120E"
},
"text" : "DSRA1120E"
} ],
"annotatorCommon_ShortnameOutput" :
[ { "field_type" : "ShortnameWS",
"span" :
{ "begin" : 43,
"end" : 56,
"text" : "TraceResponse"
},
"text" : "TraceResponse"
} ],
"annotatorCommon_ThreadIDOutput" :
[ { "field_type" : "ThreadIDWS",
"span" :
{ "begin" : 34,
"end" : 42,
"text" : "00000010"
},
"text" : "00000010"
} ],
"annotatorCommon_msgText" :
[ { "fullMsg" :
{ "begin" : 59,
"end" : 167,
"text" : "DSRA1120E: Application did not explicitly close
all handles to this Connection. Connection cannot be pooled."
},
"span" : { "begin" : 70,
"end" : 167,
"text" : "Application did not explicitly close all handles
to this Connection. Connection cannot be pooled."
}
} ]
},
"content" :
{ "span" : { "begin" : 1,
"end" : 169,
"text" :
"[03/24/13 07:16:28:103 GMT+05:30] 00000010 TraceResponse
E DSRA1120E: Application did not explicitly close all handles to this Connection.
Connection cannot be pooled.\n"
},
"text" : "[03/24/13 07:16:28:103 GMT+05:30] 00000010 TraceResponse
E DSRA1120E: Application did not explicitly close all handles to this Connection.
Connection cannot be pooled.\n"
},
"metadata" : { "batchsize" : "506",
"flush" : true,
"hostname" : "mylogfilehost",
"inputType" : "logs",
"logpath" : "/data/unityadm/IBM/LogAnalysis/logsources/was/
SystemOut.log",
"datasource" : "WAS system out",
"regex_class" : "AllRecords",
"timestamp" : "03/24/13 07:16:28:103 GMT+05:30",
"type" : "A"
}
}
- Annotations: provide access to the annotation results that are created by the annotations engine when it processes an incoming log record according to AQL rules or custom logic.
- Content: provides access to the raw logical log record.
- Metadata: provides access to some of the metadata that describes the file that the log record was obtained from. For example, the host name or data source. In general, the metadata section contains any name/value pairs sent to the IBM Operations Analytics - Log Analysis server from a client along with the log data.
When you create the indexing configuration, you can set
the value of the sourcepaths
attribute for each field
to a dot notation reference to an attribute within the input JSON
data structure.
MsgId
from the previous example,
use the following dot notation reference that references the actual
value DSRA1120E
:annotations.annotatorCommon_MsgIdOutput.text
annotations.annotatorCommon_MsgIdOutput.span.text
sourcepaths
attribute value
of each field to be indexed. For example:content.text
metadata.hostname
For more information about indexing configuration, see Indexing configuration in the Extending guide.