Important Updates: Tivoli Storage Manager Server 1024 Database Connection Limit on AIX

Flash (Alert)


Abstract

The Tivoli Storage Manager server cannot establish more than 1024 simultaneous connections to its DB2 database when using local, IPC-based connections on AIX. Servers with large workloads might reach this limit and fail with a number of different symptoms. Internal errors in the server can also cause it to run out of connections even when the server is not running a large workload. A work around is available for V6.1 and V6.2. A fix is available in V6.3.

Content


Summary

  • The 1024 maximum connection limit, when using local database connections, is a permanent restriction on AIX systems. You can bypass the limitation by reconfiguring the server database to use TCP/IP to connect to the database instead of local connections.

    Beginning with level 6.3.0, the server will automatically use TCP/IP connections. Both new servers and existing servers that are upgraded from earlier versions will be automatically reconfigured to use TCP/IP. No manual action is required when using level 6.3.0 or higher.
  • You can convert to using TCP/IP connections using the dsmcvt2tcp script or by following the manual steps detailed below in section "Instructions for converting to TCP/IP connections." The script is installed with the server beginning in levels 6.1.4 and 6.2.1. You can also download it at: public.dhe.ibm.com.
  • There are security implications to switching to TCP/IP connections. The dsmcvt2tcp script will set up firewall rules to reduce the impact of the change. Instructions for manually setting up the firewall rules are detailed below in section "Security implications of using TCP/IP connections."

    Note: The Tivoli Storage Manager V6.3 server fixes the security concerns associated with the work around. As a result, firewall rules are suggested only when using the work around with V6.1 or V6.2 servers.
  • A known problem on AIX (APAR IZ75548) can cause severe performance problems when enabling AIX IP Security with 10 GB network interfaces. A fix is available.
  • An additional server problem has been discovered that might aggravate the problem by causing a rapid increase in the number of client sessions. A fix for the problem (APAR PK84259) is available in levels 6.1.3.4, 6.1.4, and 6.2.1.
  • Tivoli Storage Manager server APAR IC76335 documents an error in which the server might fail to close unused connections to the database. This can cause the server to establish more than 1024 simultaneous connections to the database even when the server is not heavily loaded. Refer to http://www.ibm.com/support/docview.wss?uid=swg1IC76335 for more information about the APAR and the circumstances that can lead to the server failing to close connections to the database. APAR IC76335 is projected to be fixed in levels 6.1.5.1 and 6.1.6, and is available in levels 6.2.3 and 6.3.0.
  • Some maintenance activities can cause the server to resume using local connections. If this problem occurs, it will be necessary to repeat the steps used to configure the server database to use TCP/IP. Section "Maintenance activities that affect TCP/IP connections" provides details on the activities that can cause this.
  • If the server is running on a high availability cluster using solutions such as IBM PowerHA or Tivoli System Automation, then the steps that you take to configure the server database to use TCP/IP might need to be repeated on each node of the cluster.

Symptoms

Tivoli Storage Manager server operations might fail when the server is under heavy workload. Failure symptoms include some or all of the following server messages:

  • ANR0171I dbieval.c(827): Error detected on 0:1026, database in evaluation mode.
  • ANR9999D_0645605689 tbOpenX(tbtbl.c:4273) Thread<nnn>: Failure participating on transaction.
  • ANR9999D_1910305017 smExecuteSession(smexec.c:2379) Thread<nnn>: Session NNNNN with client CLIENTNAME (PLATFORM) rejected - error creating the central logging vector.
  • ANR0105E sspool.c(4451): Error setting search bounds for table "SS.Pools".
  • ANR0102E admactlg.c(4212): Error 16 inserting row in table "Activity.Log".
  • ANR2009E Activity log process has stopped - database error.
  • ANR2006E Activity log process was not started, the default output stream cannot be opened

After issuing the messages, the server might hang or crash, although this does not happen in all cases.

Diagnosing the Problem

  1. Locate the db2diag.log for the database instance.

    The DIAGPATH database manager configuration parameter identifies the location of the log file. Issue this command to determine the value of the configuration parameter: db2 get dbm cfg | grep DIAGPATH.
  2. Inspect the db2diag.log file. The db2diag.log file records errors that occur near the time of the failure. The specific errors that indicate that the 1024 connection limit has been reached are:

    2010-03-23-11.21.00.569901-420 I8863A430          LEVEL: Warning
    PID     : 741630               TID  : 1286        PROC : db2sysc
    INSTANCE: tsminst1             NODE : 000
    EDUID   : 1286                 EDUNAME: db2ipccm
    FUNCTION: DB2 UDB, fast comm manager, sqkfDynamicResourceMgr::AdjustResources, probe:100
    MESSAGE : FCM Automatic/Dynamic Resource Adjustment (Session): 256 successfully
             allocated. New total is 1152

    2010-03-23-11.21.01.654777-420 E9294A500          LEVEL: Error (OS)
    PID     : 704702               TID  : 1           PROC : dbconn.aix
    INSTANCE: tsminst1             NODE : 000
    EDUID   : 1
    FUNCTION: DB2 UDB, oper system services, sqloOpenMLNQue, probe:3
    MESSAGE : ZRC=0x870F003E=-2029060034=SQLO_QUE_BAD_HANDLE "Bad Queue Handle"
             DIA8555C An invalid message queue handle was encountered.
    CALLED  : OS, -, semop
    OSERR   : EINVAL (22) "A system call received a parameter that is not valid."

Resolving the Problem

Instructions for converting to TCP/IP connections

The dsmcvt2tcp script can be used to convert an existing Tivoli Storage Manager database instance to use TCP/IP connections. The script will be shipped with the AIX server beginning in levels 6.1.4 and 6.2.1.
It can also be downloaded at: ftp://public.dhe.ibm.com/storage/tivoli-storage-management/tools/aix/dsmcvt2tcp

Note: The dsmcvt2tcp script is not shipped with level 6.3.0 because the server will automatically use TCP/IP connections.

After you have downloaded the script, run it with the -h parameter for usage instructions.

Alternatively, follow these instructions to manually convert to TCP/IP connections. The following instructions assume that the Tivoli Storage Manager DB2 instance user is tsminst1. If the instance user has a different name, then you must replace any occurrences of "tsminst1 " with the actual name of the instance user.
  1. Find the service name in /etc/services that the DB2 instance uses. You can find this by running this command:

    grep DB2 /etc/services

    Tip: The service name that the tsminst1 instance uses is typically named "DB2_tsminst1".

    If there is no entry for the database instance in the /etc/services directory, you must add one (while logged in as root). The DB2 service uses a range of four contiguous port numbers, and cannot use any number that is being used by another service. Also note that no port number can be greater than 65536. Search /etc/services for the proposed port numbers before using.

    DB2_tsminst1 60000/tcp
    DB2_tsminst1_1 60001/tcp
    DB2_tsminst1_2 60002/tcp
    DB2_tsminst1_END 60003/tcp


    Note: The white space in the preceding example is a single tab character, not spaces.
  2. Log in as the Tivoli Storage Manager instance user.
  3. Run this command:

    db2 list database directory

    Take note of the 'Local database directory' value associated with the TSMDB1 database. This is typically set to the instance home directory, but might have a different value.
  4. Run the following command:

    db2 get dbm cfg

    Note the values of the settings for the AUTHENTICATION, TRUST_CLNTAUTH, SVCENAME, and NUM_POOLAGENTS parameters. You will need this information if you have to back out the change at a later time.
  5. Shut down the Tivoli Storage Manager server.
  6. Run the following command to ensure that the DB2 server is shut down:

    db2stop force
  7. Run the following commands:

    db2 update dbm cfg using AUTHENTICATION CLIENT
    db2 update dbm cfg using TRUST_CLNTAUTH CLIENT
    db2 update dbm cfg using SVCENAME DB2_tsminst1
    db2 update dbm cfg using NUM_POOLAGENTS 0

    Note: The value specified for SVCENAME should match the service name in /etc/services. Also note that NUM_POOLAGENTS is set to 0 to circumvent a crash in DB2 that has been observed after setting it up to use TCP/IP, but is not strictly needed for the conversion itself.
  8. Run the following command:

    db2 uncatalog database TSMDB1
  9. Run the following command:

    db2 catalog tcpip node loopbk remote 127.0.0.1 server DB2_tsminst1

    The last parameter should match the service name in /etc/services.
  10. Run the following command:

    db2 catalog database TSMDB1 as TSMDB1_L on /home/tsminst1

    The last parameter (the directory name) should match the directory from the "db2 list database directory" command in step 3.
  11. Run the following command:

    db2 catalog database TSMDB1_L as TSMDB1 at node loopbk
  12. Run the following command:

    db2set DB2COMM=TCPIP
  13. Run "db2start" to start the database manager. Issue the following command to verify that the configuration parameters that were set in step 7 were changed to the appropriate new values:

    db2 get dbm cfg
  14. Run the following commands to verify that you can connect to the database:

    db2 connect to TSMDB1
    db2 disconnect current

    If the connection is successful then Tivoli Storage Manager should start up normally.

Note: If Step 13 is slow, or if the server startup is abnormally slow, then your environment might have a DNS server set to ignore TCP/IP v6, which delays DB2 connections. This can be fixed by modifying /etc/netsvc.conf to only look up TCP/IP v4 addresses, assuming your environment does not use TCP/IP v6. For example, change /etc/netsvc.conf from:

hosts=local,bind

to:

hosts=local4,bind4

Security implications of using TCP/IP connections

Configuring the database manager to accept TCP/IP connections allows remote users to connect to the database. To prevent remote users from connecting, and to ward off potential security risks associated with remote users, consider configuring the AIX IP Security feature to only allow local users to connect to the database. Note, however, that in some cases, configuring IP Security can result in a severe performance degradation. Work-arounds are documented in AIX APAR IZ75548, and are also described below in "Performance implications of using TCP/IP connections."

Log in as root, and complete the following steps to set up IP Security to allow access to only local users. Note that these steps assume that the first port on which service for the instance was configured to listen is 60000. If your instance uses a different port, then substitute that port number for any occurrence of 60000.
  1. Run the following command to enable IP Security for IP V4:

    /usr/sbin/mkdev -c ipsec -t 4

  2. Run the following command to allow local users to connect to port 60000:

    /usr/sbin/genfilt -v 4 -a P -s 127.0.0.1 -m 255.255.255.255 -d 0.0.0.0 -i all -M 0.0.0.0 -c tcp -o any -p 0 -O eq -P 60000 -w I

  3. Run the following command to deny remote access to port 60000:

    /usr/sbin/genfilt -v 4 -a D -s 0.0.0.0 -m 0.0.0.0 -d 0.0.0.0 -M 0.0.0.0  -c tcp -o any -p 0 -O eq -P 60000 -w I -i all

  4. Run the following command to activate the two new filter rules:

    /usr/sbin/mkfilt -v4 -u

Performance implications of using TCP/IP connections

Initial tests using TCP/IP database connections have shown a 10-20% increase in overall CPU utilization. Throughput should not be affected, provided your system is not already close to using 100% of the CPU.

Users of 10GB network interfaces might encounter severe performance problems, as documented in AIX APAR IZ75548. This APAR reports throughput as low as 300 Kb/sec when the 10GB adapter's 'large receive offload' feature is enabled along with the AIX IP Security feature. A fix is available for IZ75548. However, if the fix has not been installed, then the recommend work-around for this issue is to do o ne of the following:
  • turn off large_receive,
  • turn off chksum_offload. or
  • turn off AIX IP Security.

Instructions for reverting to local connections

If you encounter problems using TCP/IP connections, complete the following steps to revert to local connections . You might also choose to revert to local connections if you reduce your server's workload to the point that you are no longer in danger of exceeding 1024 simultaneous database connections, and you want to reclaim the extra CPU utilization associated with using TCP/IP connections.
  1. Shut down the Tivoli Storage Manager server,
  2. Run the following command to ensure that the DB2 server is shut down:

    db2stop force

  3. Run the following command:

    db2 list database directory

    Take note of the 'Local database directory' value associated with the TSMDB1_L database. This is typically set to the instance home directory, but might have a different value.
  4. Run the following command:

    db2 uncatalog database TSMDB1
  5. Run the following command:

    db2 uncatalog database TSMDB1_L
  6. Run the following command:

    db2 catalog database TSMDB1 on /home/tsminst1

    The last parameter (the directory name) should match the directory from the "db2 list database directory" command in Step 3.
  7. Run the following commands:

    db2 update dbm cfg using AUTHENTICATION SERVER
    db2 update dbm cfg using SVCENAME \"\"
    db2 update dbm cfg using NUM_POOLAGENTS AUTOMATIC

    db2set -i tsminst1 db2comm=
  8. Run "db2start" to start the database manager.
  9. Run the following commands to verify that you can connect to the database:

    db2 connect to TSMDB1
    db2 disconnect current

    If the connection is successful, Tivoli Storage Manager should start up normally.

After reverting to local connections, you may also delete any AIX IP Security filters that were put in place when setting up for TCP/IP connections. Log in as root, and complete the following steps to remove IP Security filters, or to disable IP Security entirely:
  1. Run the following command to obtain a list of the current filters:

    /usr/sbin/lsfilt -v4
  2. Each filter rule has a unique identifying number. For each rule that was defined when setting up for TCP/IP connections, run the following command to remove the rule, making sure to specify the rule's identifying number with the -n parameter.

    /usr/sbin/rmfilt -v4 -n fid
  3. Run the following command to activate IP Security with the new set of rules

    /usr/sbin/mkfilt -v4 -u

    or run the following command to deactivate IP Security filtering entirely:

    /usr/sbin/mkfilt -v4 -d
  4. (Optional) Run the following command to deactivate IP Security. Do not do this if you need IP Security to filter on any other ports.

    /usr/sbin/rmdev -dl ipsec

Maintenance activities that affect TCP/IP connections

Maintenance activities that uncatalog and recatalog, or delete and recreate the server database will result in a database that is configured to use local connections. Examples of such activities include:
  • Deleting the database instance using the db2idrop command. This will implicitly uncatalog the database, requiring that it be recataloged once the instance is recreated.
  • Deleting the database using the DB2 DROP DATABASE command before restoring the database from a backup copy.
  • Upgrading the server from version 6.1 to either 6.2.0 or 6.2.1. As part of the upgrade, existing database instances are dropped and recreated, and the server databases are recataloged. Version 6.2.2 correctly reconfigures the database to use TCP/IP connections; however, earlier 6.2 fix packs do not.

After performing one of these maintenance tasks, you can use the "db2 list database directory" command to determine whether the database is still configured to use TCP/IP connections.
  • If the command lists a single database entry that has the same value specified for both "Database alias" and "Database name" then local connections are being used. Repeat the steps used to configure the server database to use TCP/IP connections.

    $ db2 list database directory

    System Database Directory

    Number of entries in the directory = 1

    Database 1 entry:

    Database alias                       = TSMDB1
    Database name                        = TSMDB1
    Local database directory             = /home/tsminst1
    Database release level               = d.00
    Comment                              = TSM SERVER DATABASE
    Directory entry type                 = Indirect
    Catalog database partition number    = 0
    Alternate server hostname            =
    Alternate server port number         =
  • If the command lists two database entries that refer to one another, then the database is still configured to use TCP/IP connections and no further action is required.

    $ db2 list database directory

    System Database Directory

    Number of entries in the directory = 2

    Database 1 entry:

    Database alias                       = TSMDB1
    Database name                        = TSMDB1_L
    Node name                            = LOOPBK
    Database release level               = d.00
    Comment                              =
    Directory entry type                 = Remote
    Catalog database partition number    = -1
    Alternate server hostname            =
    Alternate server port number         =

    Database 2 entry:

    Database alias                       = TSMDB1_L
    Database name                        = TSMDB1
    Local database directory             = /home/tsminst1
    Database release level               = d.00
    Comment                              =
    Directory entry type                 = Indirect
    Catalog database partition number    = 0
    Alternate server hostname            =
    Alternate server port number         =

Related information

More information about APAR PK84259
DSMCVT2TCP Script

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Tivoli Storage Manager
Server

Software version:

6.1, 6.2

Operating system(s):

AIX

Software edition:

All Editions

Reference #:

1428557

Modified date:

2011-02-24

Translate my page

Machine Translation

Content navigation