IBM Tivoli Netcool/OMNIbus, Version 7.4

Run the ObjectServer with profiling enabled

Use profiling to measure the amount of time spent running SQL queries on the ObjectServer and to identify which client connections are using up excessive resources.

To enable profiling on the ObjectServer:
  1. Ensure that the Profile property is set to TRUE. (This is the default value.)
  2. Use the ProfileStatsInterval property to specify an interval at which profiling information is written to the profile log file. A default interval of 60 seconds is used if you do not change this value.
  3. Ensure that the profiler_triggers trigger group and its triggers (profiler_group_report, profiler_report, profiler_toggle) are enabled for profile logging.

Timing information for running SQL commands from client connections is logged to the catalog.profiles table. You can use Netcool/OMNIbus Administrator to view details that are recorded in the catalog.profiles table. From the Netcool/OMNIbus Administrator window, select the System menu button and then click Databases. You can use the Data View tab on the Databases, Tables and Columns pane to view table data, and use the Column Definitions tab to view detailed information about the columns in the table.

Profile statistics are also logged to a profile log file $NCHOME/omnibus/log/servername_profiler_report.logn, where servername represents the ObjectServer name and n is a number. The profile log file shows a breakdown of the time spent for each client connection and the total time spent by client type, for each granularity period (as set by the Granularity property). Each client shown in the log file is identified by a standard default name (for example, GATEWAY or PROBE) and the host on which the client is running. You can use the profile log file to analyze how the ObjectServer spent its time during each granularity period and calculate the percentage of time used. For example, if the granularity period is set to 60 seconds and the total time spent for all the connections during a particular period was 30 seconds, you can calculate that the ObjectServer spent 50% of its available time on running SQL commands from client connections.

The work completed in a report period is output in a summary line for each granularity period. The information presented in the summary line is displayed in the following format: Total time in the report period (profiling period): total time by all clients. The total time by all clients can be greater than the profiling period due to the multi-threaded nature of ObjectServer. This is especially true for multi-CPU systems. If the profiling period is greater than the configured profiling period it means that ObjectServer is too busy to report the profiling time and might indicate the ObjectServer is overloaded. If the total time by all clients is greater than the profiling period, it indicates the system is under load, but does not necessarily indicate a problem.

Sample output recorded in a profile log file for a granularity period is as follows:
[1]  Mon Oct 12 17:39:46 2009: Individual user profiles:
[2]  Mon Oct 12 17:39:46 2009: 'Administrator' (uid = 0) time on adminhost: 0.000000s
[3]  Mon Oct 12 17:39:46 2009: 'isql' (uid = 0) time on omnihost1.ibm.com: 3.770000s
[4]  Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on probehost.ibm.com: 5.010000s
[5]  Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 0) time on omnihost1.ibm.com: 10.010000s
[6]  Mon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 45) time on omnihost1.ibm.com: 0.000000s
[7]  Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 45) time on omnihost1.ibm.com: 9.870000s
[8]  Mon Oct 12 17:39:46 2009: 'c@xxxxx@xxxxx:11.0' (uid = 55) time on omnihost1.ibm.com: 0.000000s
[9]  Mon Oct 12 17:39:46 2009: 'e@c0B4D@c0142:11.0' (uid = 55) time on omnihost1.ibm.com: 6.020000s
[10] Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.270000s
[11] Mon Oct 12 17:39:46 2009: 'GATEWAY' (uid = 0) time on omnihost1.ibm.com: 0.000000s
[12] Mon Oct 12 17:39:46 2009: 'PROBE' (uid = 0) time on omnihost1.ibm.com: 3.010000s
[13] Mon Oct 12 17:39:46 2009: Grouped user profiles:
[14] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'PROBE': 8.020000s
[15] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'GATEWAY': 0.270000s
[16] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'c@xxxxx@xxxxx:11.0': 0.000000s
[17] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'e@c0B4D@c0142:11.0': 25.93000s
[18] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'isql': 3.77000s
[19] Mon Oct 12 17:39:46 2009: Execution time for all connections whose application name is 'Administrator': 0.000000s
[20] Mon Oct 12 17:39:46 2009: Total time in the report period (59.275782s): 29.980000s
The line numbers are included in the preceding output to help describe the entries:
  • Line [1]: Introduces a list of individual clients that are connected to the ObjectServer.
  • Line [2]: Shows the application name for the connected client (Administrator), the associated user for that client (user ID 0), the host computer (adminhost), and the amount of time the client has used in the last profiling period (0.000000s).
  • Line [13]: Introduces a list that shows the aggregated time for all clients of the same type.
  • Line [14]: Shows that the two connected probes used a combined time of 8.02 seconds.
  • Line [17]: Shows that the event lists used 25 seconds. Consider investigating the individual times to see which event list is using the most time.
  • Line [20]: Shows that the profiling period as 59.27 seconds and the total time by all clients as 29.98 seconds. The profiling period is approximately the same as the configured profiling period of 60 seconds; this would be expected if the system is not over loaded.
Analyze the profiling statistics in the log file and database table to identify which clients are using the most time and why:
  • Determine whether all the client connections are necessary, and drop any redundant client connections; for example, event lists that are left connected after operators have vacated the premises.
  • If a desktop event list or a Web GUI client is using a lot of time, focus on the filters that are being used by that client. Analyze the filters both for the number and complexity of the individual queries, with the aim of making them more efficient.
  • If the client is a probe, performance degradation might be due to poorly-written rules files that allow unnecessary events to be forwarded to the ObjectServer, the amount of detail information sent per event, or event flooding.
  • Increase the granularity period of the ObjectServer to alleviate the effects of heavy client loads. This action slows down the rate at which the ObjectServer sends IDUC broadcasts to its clients, and can lead to improved system performance. However, incoming events will take longer to reach clients, particularly if the ObjectServer is part of a multitiered architecture.