IBM Support

What is nsf.lock mutex contention?

Troubleshooting


Problem

This article discusses the nsf.lock mutex contention. Userthreads can stop processing while waiting on the nsf.lock mutex for a network connection.

Cause


What is nsf.lock mutex contention?

The nsf.lock mutex is a mutex for the Network Shared Files (NSF) table. It is a global mutex that needs to be acquired when searching for a file descriptor (fd) for a connection. Whenever there is a new connection the listener thread that runs on a CPU vp creates a new fd for that connection thread. When a thread migrates to a different CPU vp that vp's process acquires the nsf.lock mutex and requests the fd information for that connection from the other CPU vp in order to talk with the client and then releases the mutex.

If there are many userthreads waiting to acquire nsf.lock mutex, there can be a slow response time for all the connections.

Diagnosing The Problem


You can verify this contention with the following onstat commands:

  • onstat -g ath


    Example:

$  onstat -g ath

13   178b5698 0        3 mutex wait nsf.lock 1cpu   tlitcplst
1184 1800d720 1735cb34 2 mutex wait nsf.lock 3cpu   sqlexec
1313 17a00330 1735c530 3 running             6cpu   sqlexec


  • onstat -g lmx


    Example:

$  onstat -g lmx

Locked mutexes:
mid      addr     name           holder   lkcnt      waiter

waittime
3258     17389dc8 nsf.lock       1313     0      13     491
                                                       1184
                                                       188

   
  • onstat -g wmx


    Example:

$  onstat -g wmx

Number of mutexes on VP free lists: 166

Mutexes with waiters:
mid      addr     name         holder   lkcnt  waiter   waittime
3258     17389dc8 nsf.lock       1313     0      13       491

                                                          1184
                                                          188    

Number of mutexes on VP free lists: 166

Resolving The Problem

The exchange of fd information between the cpu vps is realized by a set of functions running in a frame of a poll thread. This set of functions is often called the 'fd-server'. The nsf.lock mutex contention is usually caused by the fact that there are multiple fd-servers accessing the global NFS table. By default, the fd-server is running on each of the first 10 poll threads for a particular network protocol.

How to alleviate/prevent the nsf.lock mutex contention

  • In versions 10.00.xC8 (and older), or 11.10.xC2 (and older) the number of fd-servers can't be configured, so the only way to decrease it is to decrease the number of the poll threads for a particular network protocol in the NETTYPE configuration parameter.
  • In versions 10.00.xC9, 11.10.xC3 and 11.50.xC1 a new (undocumented) configuration parameter NUMFDSERVERS was introduced to make the number of fd-servers configurable. Unfortunately, as there still is only one global NSF table, the contention on the nsf.lock mutex is still present in case there are more poll threads acting as fd-servers. Hence it's suggested to set this parameter to 1 in the ONCONFIG file in these versions, so only the first of the poll threads will serve as the fd-server.
  • The preferred solution is to upgrade to Informix Server 11.70, where the NSF table handling was significantly improved to remove the contention on nsf.lock mutex (each of the fd-servers now has its own NSF table) allowing for better throughput (see the link in 'Related URL').
  • You will see this contention when there are many new connections all at once and they migrate over the various cpu vps. One of the solutions would be to distribute new connections (at the application level) evenly over time rather than all at once.
  • You can consider implementing the Informix MaxConnect product which is designed to handle a large number of incoming application connections and concentrate them into a small number of connections to the database server.

[{"Product":{"code":"SSGU8G","label":"Informix Servers"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"10.0;11.1;11.5;7.3;9.4;11.7","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 June 2018

UID

swg21145897