IBM Support

Error: 'Warning: Cannot record event - cannot keep up with event occurrence rate!"

Technote (FAQ)


Question

The following message appears sporadically and frequently on the Lotus Domino server console:

Warning: Cannot record event - cannot keep up with event occurrence rate!


It does not appear in the Miscellaneous Events view of the log.nsf database. It will appear in the debug_outfile if this parameter is set in the notes.ini file, and it will appear in redirected console output on Unix systems.

The Event Monitor task on the Domino server is started by "load event" on the console, or with "event" on the "ServerTasks" line in the notes.ini. Once the Event Monitor is started, it processes events on the server, as configured in events4.nsf. Other server processes, when generating events, issue this warning message when the memory pool for queueing unprocessed events is full.

Answer

The basic problem is that the Event Monitor process on this server, or the server itself, is too busy. One of the following may relieve the problem:

  • Fix the errors that are occurring on the server. If certain server processes discover a lot of errors at certain times, as server startup, or when daily scheduled agents run, that long list of errors will put a temporary burden on the event processing mechanisms.

  • Reduce the number of event notifications on the server. If there are multiple actions to take on each event, events may not be cleared quickly. Do not ask for notification on lower severity events that you are going to ignore anyway.

  • Do not use more expensive notification methods if they can be invoked frequently. Running a program is expensive. Relaying to another server requires a network transaction. Substitute less expensive notification methods, such as logging to a database.

  • Diversify the event notification load on the server. A single thread handles each type of notification action.

  • Reduce the number of Monitors and Probes running on the server, which may be generating more events.

  • Add Suppression Times to Event Messages that are flooding the server, so each occurrence doesn't cause a notification.

  • Improve the performance of the server through standard means - eliminate excess processes, add memory, reorganize disk configurations - all through an effective performance management program. Event Monitoring may be part of the overload.

  • Reduce the amount of logging done on the server. Every message written to the Miscellaneous Events view is an event, which will be written to the Event Pool, then checked in events4.nsf by the Event Monitor task to see if the Type and Severity for the message have Event Notifications requested. Even messages not triggering notifications may trigger the "Cannot record event" message.


Excessive notifications is the most important area to check. The thread for each type of notification has a built-in delay to prevent flooding the notification channel and to release the server to do other server work. Log to a database, Mail, Broadcast, Relay, and Run a program, all wait 1/10 second between notifications. Pager, NT Event Log and Unix System Log all wait 1/2 second between notifications. This means that each server will only be able to process between one and less than 10 notifications of each type per second. To check for the speed of processing notifications do one of the following:
  • In a database to which you are logging events, compare the time the event occurred with the time the document was created.

  • Change a Run program notification to something that will write a time stamp somewhere, so you can check the delay to execution.

Examples of both those steps are included below. The amount of memory the Event Manager needs to queue an individual event varies, but it must include all the information that is in an event document logged to statrep.nsf. The size of those documents ranges from 100 to 200 bytes, which is a good estimate. This memory is needed from the original occurrence until the lookup by Event Manager, and again for each notification that must be processed.
  • Issue the Domino console command "tell event dump" to see the amount of event memory that is allocated and in use.

  • Set the Event_Pool_Size parameter in notes.ini, as described below, to increase the amount of memory to queue events.

  • Set the Debug_Event=1 parameter in notes.ini, to look at the amount of Event Pool used just to cache the server's standard notification setup.

  • Also use Debug_Event=1 over problem periods to look at what is generating the event and notification activity on the server.


Standard memory leak debugging techniques are a last resort for the Event Pool, to be used only after the steps above have eliminated event system overloading. The Event Pool will grow during periods of frequent events, or during periods when notification processing backs up. The Event Pool memory blocks will not be released after that peak, but the memory within them will be reusable for queues of events and notifications. Therefore the pool may appear to be leaking, when the server is just experiencing larger and larger processing backlogs during event bursts.

Run Domino server memory dumps (load server -m), or run NSD with Memcheck. (Memcheck must be installed on Unix systems. NSD, which includes Memcheck, must be installed on Windows systems. Both are available from Lotus Support when troubleshooting suspected memory leaks.) Provide the memory statistics from "tell event dump" after Event Manager started, and after the "Cannot record event" messages, to indicate actual use of the pool memory, not just allocation of the blocks. Provide the server events4.nsf setup, and a copy of a database to which the server is logging events. The database copy must be taken at the OS level to preserve the document time stamps, but it can be an OS-level copy of a complete replica. Include the description and results of any experiments that have been run to verify that the server can keep up with the configured run program notifications.

Supporting Information:

The backup of event notifications that log to a database can be checked using the Document Properties of the documents in the database. This example is from statrep.nsf on a deliberately overloaded system. The event was at 1:08:30, but the log action was 3 seconds later, both on the same server.



Here is an example of using an event notification to run a program that will do a minimum amount of work, but leave a timestamp in the log to figure out the delay after an event to queue and retrieve the notification, and to load and run a program. This can be tried on a given server at rest or under various loads.

    02/13/2001 11:44:34 AM Periodic full text indexer not started - Incorrect argument count




    A Domino console command is available to check the amount of memory allocated and used to queue events. This produces debug output, which will not be written to the log. Either capture the output from the console window, or see the techniques described under Debug_Event=1 below.

    > tell event dump

    Event pool size = 106496 bytes, used = 94810 bytes

    ACLWatchCount: 1
    DbName Severity Monitor #
    names.nsf 3 AAAA-4TQ4GA
    Replication Monitors Cached: 1
    DbName Interval Severity Next Check Time Monitor #
    names.nsf 24 3 02/13/2001 10:40:23 AM AAAA-4TQ4GB
    No file check parameters to dump
    No server access check parameters to dump

    Event_Pool_Size

    The default for Event_Pool_Size in R5 and R6 is 5242880 (5 MB), and that will be the maximum size to which the memory pool can grow for unprocessed events, unprocessed notifications, and event suppression memory. When trying to determine if this is an event overload or a memory leak, check the Event_Pool_Size= parameter. Ensuring this is set to the maximum of 5 MB will help if there is a temporary overload, from which the Event Monitor can recover. If the number of events, the number of notifications, the server workload, or the server performance, is such that the Event Monitor cannot recover, this will only prolong the inevitable. If there is actually a memory leak in the pool, this will only prolong the inevitable. If it does help, this has borrowed time to implement the steps above.

    Changes in this parameter require quitting and restarting the server, or use of the "restart server" command. More specifically, it requires release of the Domino shared memory in which the pool is stored. This means that any Notes client, Domino program, or Notes API program from another vendor, which is running from the same Domino data directory, must also be stopped. If there are several such programs from other vendors running (as with Windows NT services), it may be simpler to shut down and restart the system.

    The maximum value which can be set is Event_Pool_Size=5242880 (5 MB).

    If you use the parameter Event_Pool_Size=5,242,880 on AIX or iSeries, this will prevent HTTP from starting. You cannot use commas in the size. This parameter would have to be set as Event_Pool_Size=5242880 on the iSeries platform.

    Note: The event task pool has a maximum value of 5MB in both R5 and Notes/Domino 6 and the default is 5MB.

    Note: The default size of EVENT_POOL_SIZE for 8.5.x has been raised to 10MB (10,485,760 bytes) and the limit on this setting is 100MB (104,857,600 bytes).

    Debug_Event=1

    This produces a lot of output, which will show all the notifications being loaded when the Event Monitor starts, show the amount of Event Pool consumed initially, and show what is generating events on the server. You must take steps to capture the output, which will not be written to the log.nsf database. If you are running Domino on a Unix operating system or Unix services, and you use "tee" or otherwise redirect and save the console output, it will be captured. If not, you may want to combine this with the "Debug_Outfile" parameter. If you are trying to find the source of events, you may want to combine this with "Debug_Threadid=1" to see the issuing process and thread.

    You can assess the amount of memory just to store the monitoring and notification configuration in events4.nsf, while the server is running. Stop the Event Monitor ("tell event quit"), set the parameter (set config Debug_Event=1), start the Event Monitor ("load event"), look for the pool sizes ("Event pool size = 90112 bytes, used = 84214 bytes"), and reset and restart. ("tell event quit", "set config Debug_Event=", "load event")

    Here is some sample output from Debug_Event=1, with Debug_Threadid=1, showing the default events4.nsf configuration, and other threads actually issuing the "Cannot record event" message. Event_Pool_Size was artificially limited, and Agent Manager messages were added to events4.nsf.
      [040C:0002-03FC] 02/07/2001 12:35:07 PM Event Monitor started
      [040C:0002-03FC] EVENT: Closed Config DB
      [040C:0002-03FC] EVENT: Reopening Config DB Without SCANLOCK
      [040C:0002-03FC] TYPE+SEVERITY: Comm 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Comm 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Comm 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Security 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Security 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Security 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Mail 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Mail 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Mail 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Replica 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Replica 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Replica 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Resource 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Resource 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Resource 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Misc 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Misc 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Misc 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Server 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Server 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Server 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Statistic 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Statistic 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Statistic 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Update 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Update 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Update 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: DataBase 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: DataBase 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: DataBase 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Network 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Network 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Network 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Compiler 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Compiler 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Compiler 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Router 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Router 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Router 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Agent 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Agent 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Agent 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Client 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Client 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Client 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Addin 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Addin 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Addin 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: AdminP 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: AdminP 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: AdminP 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Web 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Web 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Web 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: News 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: News 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: News 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Ftp 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Ftp 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: Ftp 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: LDAP 1 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: LDAP 2 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] TYPE+SEVERITY: LDAP 3 Log statrep.nsf No Schedule A1/Test
      [040C:0002-03FC] Event pool size = 65536 bytes, used = 62080 bytes

      [040C:0002-03FC] ACLWatchCount: 1
      [040C:0002-03FC] DbName Severity Monitor #
      [040C:0002-03FC] names.nsf 3 AAAA-4TPU8X
      [040C:0002-03FC] Replication Monitors Cached: 1
      [040C:0002-03FC] DbName Interval Severity Next Check Time Monitor #
      [040C:0002-03FC] names.nsf 24 3 02/07/2001 12:35:08 PM AAAA-4TPU8Y
      [040C:0002-03FC] No file check parameters to dump
      [040C:0002-03FC] No server access check parameters to dump

      .....
      > l amgr

      [060C:0002-03DC] 02/07/2001 12:36:12 PM AMgr: Executive '5' started
      [040C:0002-03FC] Looking up error code AGENT MANAGER0x3345
      [01EC:0002-03D0] 02/07/2001 12:36:12 PM Agent Manager started
      [040C:0002-03FC] Looking up error code AGENT MANAGER0x3314
      [039C:0002-01CC] 02/07/2001 12:36:13 PM AMgr: Executive '2' started
      [04A4:0002-0454] 02/07/2001 12:36:13 PM AMgr: Executive '3' started
      [04B8:0002-0570] 02/07/2001 12:36:13 PM AMgr: Executive '4' started
      [0494:0002-0528] 02/07/2001 12:36:13 PM AMgr: Executive '1' started
      [040C:0002-03FC] Looking up error code AGENT MANAGER0x3345

      .....
      > tell amgr q

      [0494:0002-0528] 02/07/2001 12:41:18 PM AMgr: Executive '1' shutting down
      [039C:0002-01CC] 02/07/2001 12:41:18 PM AMgr: Executive '2' shutting down
      [04A4:0002-0454] Warning: Cannot record event - cannot keep up with event occurrence rate!

      [04A4:0002-0454] 02/07/2001 12:41:18 PM AMgr: Executive '3' shutting down
      [04B8:0002-0570] Warning: Cannot record event - cannot keep up with event occurrence rate!

      [060C:0002-03DC] 02/07/2001 12:41:18 PM AMgr: Executive '5' shutting down
      [04B8:0002-0570] 02/07/2001 12:41:18 PM AMgr: Executive '4' shutting down
      [040C:0002-03FC] Looking up error code AGENT MANAGER0x3346
      [01EC:0002-03D0] 02/07/2001 12:41:19 PM Agent Manager shutdown complete
      [040C:0002-03FC] Looking up error code AGENT MANAGER0x3315


      Additional debug parameter
      The following debug parameter can be run to list the current messages waiting in queue for each event:

      Run a "te event queuedump event.dmp" on the server console. This will indicate how many messages are in que for each event type. In the example below it is the probemgr that is the problem. This resolved the issue for one of my customers.


      Sample output:

      Event pool size = 10530366 bytes, used = 9668394 bytes
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - HTTPQueue
      Process ID - 776 (External)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - EventDispatcher
      Process ID - 1952 (Event Task)
      Message count - 1
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - EVENTLOG
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - LOG
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - MAIL
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - BROADCAST
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - RELAY
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - ACLWATCH
      Process ID - 1952 (Event Task)
      Message count - 0
      Signature - 0x0000DADA (Valid)
      *********************************************************
      Queue Name - PROBEMGR
      Process ID - 1952 (Event Task)
      Message count - 18715
      ------------------------------------------

      Related information

      How Should the DEBUG_OUTFILE Parameter Be Implemented?
      Is the notes.ini parameter debug_threadid dynamic?
      How Is a Memory Dump Taken?
      Memcheck memory analyzer
      A simplified Chinese translation is available

      Historical Number

      184414

      Document information

      More support for: IBM Domino
      Administration

      Software version: 7.0, 8.0, 8.5, 9.0

      Operating system(s): AIX, Linux, Solaris, Windows

      Reference #: 1097225

      Modified date: 04 February 2010