IBM Support

Receiving EZZ7968I Message

Troubleshooting


Problem

The EZZ7968I HELLO INTERVAL MISSED ON INTERFACE []interface_name[] message is received followed by EZZ7921I OSPF ADJACENCY FAILURE messages indicating that contact has been lost with an OSPF neighbor.

Cause

OMPROUTE implements the Open Shortest Path First (OSPF) protocol. OSPF neighbors exchange hello packets at regularly coded intervals (Hello_Interval) to verify the availability of the network links. If a neighbor does not respond within the coded dead router interval (Dead_Router_Interval) the neighbor is considered down.

The EZZ7968I message is displayed when OMPROUTE has detected that the specified interface has missed sending hellos at the configured Hello_Interval and is at least one hello away from the Dead_Router_Interval. For example, if Hello_Interval=10 and Dead_Router_Interval=40 are coded, OMPROUTE should send four hellos to a neighbor in a dead router interval or one hello every 10 seconds. OMPROUTE records a time stamp for the last hello sent.

Using a Hello_Interval=10 and a Dead_Router_Interval=40, assume that OMPROUTE was not dispatched for two outbound hello intervals. After OMPROUTE has been redispatched and the outbound hello timer has popped to send the third hello, OMPROUTE compares the current time stamp for the third hello with the last hello sent and determines that it has missed two hello intervals. Also, OMPROUTE is now just one hello away from the dead router interval. As a consequence, OMPROUTE issues message EZZ7968I as a warning that OMPROUTE has missed the hello intervals to send the hello packets to the neighbors over the specified interface. Note that these missed hello intervals have nothing to do with inbound hellos sent by the neighbors even though the neighbors may have similarly configured hello interval values.

Diagnosing The Problem

The EZZ7968I message is displayed. This always indicates that OMPROUTE has not had sufficient CPU cycles to send outbound hellos. Thus, OMPROUTE is not configured to run at its optimal setting, there is a general lack of resources for the system, or other processes are consuming the CPU.

Resolving The Problem

From OMPROUTE's perspective all that can be done is to optimize OMPROUTE to request the necessary cycles. Please check or set the following:

  1. OMPROUTE is configured as non-swappable in the PPT. For example,


    PPT PGMNAME(OMPROUTE) NOSWAP

  2. Ensure that dispatching priority for OMPROUTE address space is set at or just below the TCP/IP value if not using WLM. If using WLM, ensure that OMPROUTE is running in the SYSSTC service class (that is, same as the TCP/IP service class).

  3. Ensure that dispatching priority for OMVS address space is set at or just above the TCP/IP value if not using WLM. If using WLM, ensure that SYSTEM service class is used for OMVS. OMVS might run fine using SYSSTC service class; however, processing performed by OMVS is critical for many other processes within the system, especially those using TCP/IP services (including TCP/IP itself and OMPROUTE). If higher processes block execution in OMVS, there is the potential to suspend many other activities.



  4. Ensure that dispatching priority for SYSLOGD address space is set at or below the TCP/IP value, if not using WLM. If using WLM, ensure that it is assigned the same service class as TCP/IP (SYSSTC) since it provides services to many other address spaces (including TCP/IP and OMPROUTE).

  5. PGM=OMPROUTE in OMPROUTE's starting proc. Applications started using JCL with PGM=BPXBATCH might not be treated as started tasks and might not receive the proper dispatching priority. Therefore, it is highly suggested that OMPROUTE be started using JCL with PGM=OMPROUTE.

  6. If using OSPF on external network devices (OSAs), ensure that their Router_Priority values on the corresponding OSPF_Interface statements are less than the values on the neighboring network routers so that OMPROUTE will be less eligible to become a Designated Router (DR) or backup (BDR) for these OSA devices. Note that Router_Priority defaults to one for the lowest priority. However, it is preferable to have the Router_Priority be explicitly set to zero so that OMPROUTE will never become a DR or BDR on a OSA device for it is best that the designated router task be left to a hardware router that has the workload capacity.

  7. If sysplex autonomics is enabled, ensure that the WLM policy for the OMPROUTE address space receives sufficient resources in relation to other work being managed on the system. Under high load conditions, it is possible that OMPROUTE, if not properly classified, can trigger an autonomic response from the TCP/IP stack it has an affinity with, resulting in the TCP/IP address space removing itself from the sysplex group. For this reason, ensure that TCP/IP and OMPROUTE address spaces are placed in the SYSSTC service class. Classification in another service class will leave the system vulnerable to a sysplex distributor outage.

The steps above will optimize OMPROUTE's performance. If the EZZ7968I message is still received this indicates that there is a shortage of system resources beyond the scope of OMPROUTE.

Search the syslog from the receipt of the EZZ7968I message backwards for the dead router interval amount of time for hanging, looping, or long running jobs that might be monopolizing resources.

The MVS system might require tuning to ensure that all processes are running at the correct dispatching priority, system performance has been optimized, and that there are enough resources for the workload on the system.

As a last resort the hello interval and dead router interval can be increased to eliminate the EZZ7968I message; however, these values will have to be changed in all of OMPROUTE's neighbors to match the new value. Also, this will only mask the performance problem on the system instead of resolving the actual performance issue.

[{"Product":{"code":"SSSN3L","label":"z\/OS Communications Server"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"2.1;2.2;2.3","Edition":"All Editions","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
22 June 2018

UID

swg21380007