Trace facility

The trace facility helps you isolate system problems by monitoring selected system events or selected processes. Events that can be monitored include: entry and exit to selected subroutines, kernel routines, kernel extension routines, and interrupt handlers.

Trace can also be restricted to tracing a set of running processes or threads, or it can be used to initiate and trace a program.

When the trace facility is active, information is recorded in a system trace log file. The trace facility includes commands for activating and controlling traces and generating trace reports. Applications and kernel extensions can use several subroutines to record additional events.

For more information on the trace facility, refer to the following:

The trace facility overview

The trace facility is in the bos.sysmgt.trace file set. To see if this file set is installed, type the following on the command line:

lslpp -l | grep bos.sysmgt.trace

If a line is produced which includes bos.sysmgt.trace then the file set is installed, otherwise you must install it.

The system trace facility records trace events which can be formatted later by the trace report command. Trace events are compiled into kernel or application code, but are only traced if tracing is active.

Tracing is activated with the trace command or the trcstart subroutine. Tracing is stopped with either the trcstop command or the trcstop subroutine. While active, tracing can be suspended or resumed with the trcoff and trcon commands, or the trcoff and trcon subroutines.

Once the trace has been stopped with trcstop, a trace report can then be generated with the trcrpt command. This command uses a template file, /etc/trcfmt, to know how to format the entries. The templates are installed with the trcupdate command. For a discussion of the templates, see the trcupdate command.

Controlling the trace

The trace command starts the tracing of system events and controls the trace buffer and log file sizes. This command is documented in the article on the trace daemon in the Command's Reference.

There are three methods of gathering trace data.

  1. The default method is to use 2 buffers to continuously gather trace data, writing one buffer while data is being put into the other buffer. The log file wraps when it becomes full.
  2. The circular method gathers trace data continuously, but only writes the data to the log file when the trace is stopped. This is particularly useful for debugging a problem where you know when the problem is happening and you just want to capture the data at that time. You can start the trace at any time, and then stop it right after the problem occurs and you'll have captured the events around the problem. This method is enabled with the -l trace daemon flag.
  3. The third option only uses one trace buffer, and quits tracing when that buffer fills, and writes the buffer to the log file. The trace is not stopped at this point, rather tracing is turned off as if a trcoff command had been issued. At this point you will usually want to stop the trace with the trcstop command. This option is most often used to gather performance data where we don't want trace to do i/o or buffer swapping until the data has been gathered. Use the -f flag to enable this option.

You will usually want to run the trace command asynchronously, in other words, you want to enter the trace command and then continue with other work. To run the trace asynchronously, use the -a flag or the -x flag. If you use the -a flag, you must then stop the trace with the trcstop command. If you use the -x flag, trace automatically stops when the program finishes.

It is usually desirable to limit the information that is traced. Use the -j events or -k events flags to specify a set of events to include (-j) or exclude (-k).

Note: When you limit the trace to specific processes or threads, you also limit the amount of information traced.

To display the program names associated with trace hooks, certain hooks must be enabled. These are specified using the tidhk trace event group. For example, if you want to trace the mbuf hook, 254, and show program names also, you need to run trace as follows:

trace -aJ tidhk -j 254

Tracing occurs. To stop tracing, type the following on a command line:

trcstop
trcrpt -O exec=on

The -O exec=on trcrpt option shows the program names, see the trcrpt command for more information.

It is often desirable to specify the buffer size and the maximum log file size. The trace buffers require real memory to be available so that no paging is necessary to record trace hooks. The log file will fill to the maximum size specified, and then wrap around, discarding the oldest trace data. The -T size and -L size flags specify the size of the memory buffers and the maximum size of the trace data in the log file in bytes.

Note: Because the trace facility pins the data collection buffers, making this amount of memory unavailable to the rest of the system, the trace facility can impact performance in a memory-constrained environment. If the application being monitored is not memory-constrained, or if the percentage of memory consumed by the trace routine is small compared to what is available in the system, the impact of trace “stolen” memory should be small. If you do not specify a value, trace uses the default sizes.

Tracing can also be controlled from an application. See the trcstart, and trcstop articles.

Recording trace event data

There are two types of trace data.

generic data
consists of a data word, a buffer of opaque data and the opaque data's length. This is useful for tracing items such as path names. See the Generic Trace Channels article in the Trace Facility Overview. It can be found in Trace Facility.
Note: Tracing of specific processes or threads is only supported for channel 0. It is not supported for generic trace channels.
Non-generic data
This is what is normally traced by the AIX® operating system. Each entry of this type consists of a hook word and up to 5 words of trace data. For a 64-bit application these are 8-byte words. The C programmer should use the macros TRCHKL0 through TRCHKL5, and TRCHKL0T through TRCHKL5T defined in the /usr/include/sys/trcmacros.h file, to record non-generic data. If these macros can not be used, see the article on the utrchook subroutine.

Generating a trace report

See the trcrpt command article for a full description of trcrpt. This command is used to generate a readable trace report from the log file generated by the trace command. By default the command formats data from the default log file, /var/adm/ras/trcfile. The trcrpt output is written to standard output.

To generate a trace report from the default log file, and write it to /tmp/rptout, enter

trcrpt >/tmp/rptout

To generate a trace report from the log file /tmp/tlog to /tmp/rptout, which includes program names and system call names, use

trcrpt -O exec=on,svc=on /tmp/tlog >/tmp/rptout

Extracting trace data from a dump

If trace was active when the system takes a dump, the trace can usually be retrieved with the trcdead command. To avoid overwriting the default trace log file on the current system, use the -o output-file option.

For example:

trcdead -o /tmp/tlog /var/adm/ras/vmcore.0

creates a trace log file /tmp/tlog which may then be formatted with the following:

trcrpt /tmp/tlog

Trace facility commands

The following commands are part of the trace facility:

Command Function
trace Starts the tracing of system events. With this command, you can control the size and manage the trace log file as well as the internal trace buffers that collect trace event data.
trcdead Extracts trace information from a system dump. If the system halts while the trace facilities are active, the contents of the internal trace buffers are captured. This command extracts the trace event data from the dump and writes it to the trace log file.
trcnm Generates a kernel name list used by the trcrpt command. A kernel name list is composed of a symbol table and a loader symbol table of an object file. The trcrpt command uses the kernel name list file to interpret addresses when formatting a report from a trace log file.
Note: It is recommended that you use the -n trace option instead of trcnm. This puts name list information into the trace log file instead of a separate file, and includes symbols from kernel extentions.
trcrpt Formats reports of trace event data contained in the trace log file. You can specify the events to be included (or omitted) in the report, as well as determine the presentation of the output with this command. The trcrpt command uses the trace formatting templates stored in the /etc/trcfmt file to determine how to interpret the data recorded for each event.
trcstop Stops the tracing of system events.
trcupdate Updates the trace formatting templates stored in the /etc/trcfmt file. When you add applications or kernel extensions that record trace events, templates for these events must be added to the /etc/trcfmt file. The trcrpt command will use the trace formatting templates to determine how to interpret the data recorded for each event. Software products that record events usually run the trcupdate command as part of the installation process.

Trace facility calls and subroutines

The following calls and subroutines are part of the trace facility:

Subroutine Description
trcgen, trcgent

Records trace events of more than five words of data. The trcgen subroutine can be used to record an event as part of the system event trace (trace channel 0) or to record an event on a generic trace channel (channels 1 through 7). Specify the channel number in a subroutine parameter when you record the trace event. The trcgent subroutine appends a time stamp to the event data. When using AIX 5L Version 5.3 with the 5300-05 Technology Level and above, the time stamp is always appended to the event data regardless of the subroutine used. Use trcgenk and trcgenkt in the kernel. C programmers should always use the TRCGEN and TRCGENK macros.

utrchook, utrchook64 Records trace events of up to five words of data. These subroutines can be used to record an event as part of the system event trace (trace channel 0). Kernel programmers can use trchook and trchook64. C programmers should always use the TRCHKL0 - TRCHKL5 and TRCHKL0T - TRCHKL5T macros.

If you are not using these macros, you need to build your own trace hook word. The format is documented with the /etc/trcfmt file. Note that the 32-bit and 64-bit traces have different hook word formats.

trcoff Suspends the collection of trace data on either the system event trace channel (channel 0) or a generic trace channel (1 through 7). The trace channel remains active and trace data collection can be resumed by using the trcon subroutine.
trcon Starts the collection of trace data on a trace channel. The channel can be either the system event trace channel (0) or a generic channel (1 through 7). The trace channel, however, must have been previously activated by using the trace command or the trcstart subroutine. You can suspend trace data collection by using the trcoff subroutine.
trcstart Provides a library interface to the trace command. It returns the channel number of the trace it starts. If a generic channel is requested, the channel number is one of the following numbers: 1,2,3,4,5,6, 7. Otherwise the channel number is 0.
trcstop Frees and deactivates a generic trace channel.

Trace facility files

File Description
/etc/trcfmt Contains the trace formatting templates used by the trcrpt command to determine how to interpret the data recorded for each event.
/var/adm/ras/trcfile Contains the default trace log file. The trace command allows you to specify a different trace log file.
/usr/include/sys/trchkid.h Contains trace hook identifier definitions.
/usr/include/sys/trcmacros.h Contains commonly used macros for recording trace events.

Trace event data

See the /etc/trcfmt file for the format of the trace event data.

Trace hook identifiers

A trace hook identifier is a three- or four-digit hexadecimal number that identifies an event being traced. Prior to AIX 7.1and on 32-bit applications running on AIX 7.1 and above, only three-digit hook identifiers can be used. When using a tracing macro such as TRCHKL1, you specify the trace hook as:
hhh00000
where hhh is the hook id.
On 64-bit applications and kernel routines running on AIX 7.1 and above, three- and four-digit identifiers can be used. When using a tracing macro such as TRCHKL1, you specify the trace hook as:
hhhh0000
where hhhh is the hook id.
Note: If a four-digit identifier is used and the identifier is less than 0x1000, the least-significant digit must be 0 (of the form 0x0hh0).

A three-digit identifier has an implicit 0 in its least-significant digit so that a 32-bit hook identifier is equivalent to a 64-bit hook of the form hhh0.

Most trace hook identifiers are defined in the /usr/include/sys/trchkid.h file. The values 0x0100 through 0x0FF0 are available for use by 64-bit user applications. The values 0x010 through 0x0FF are available for use by 32-bit user applications. All other values are reserved for system use. The currently defined trace hook identifiers can be listed using the trcrpt -j command.

Trace facility generic trace channels

The trace facility supports up to eight active trace sessions at a time. Each trace session uses a channel of the multiplexed trace special file, /dev/systrace. Channel 0 is used by the trace facility to record system events. The tracing of system events is started and stopped by the trace and trcstop commands. If you trace specific processes or threads, or if a program is traced, only channel 0 is used. Channels 1 through 7 are referred to as generic trace channels and can only be used by subsystems for other types of tracing such as data link tracing.

To implement tracing using the generic trace channels of the trace facility, a subsystem calls the trcstart subroutine to activate a trace channel and to determine the channel number. The subsystem modules can then record trace events using the TRCGEN or TRCGENT macros, or if necessary, trcgen, trcgent, trcgenk, or trcgenkt subroutine. The channel number returned by the trcstart subroutine is one of the parameters that must be passed to these subroutines. The subsystem can suspend and resume trace data collection using the trcoff and trcon subroutines and can deactivate a trace channel using the trcstop subroutine. The subsystem must provide the user interface to activate and deactivate subsystem tracing.

The trace hook IDs, most of which are stored in the /usr/include/sys/trchkid.h file, and the trace formatting templates, which are stored in the /etc/trcfmt file, are shared by all the trace channels.