IBM Support

db2start failing with SQL6048N error.

Troubleshooting


Problem

How to resolve db2start failing with SQL6048N error in MPP /DPF or single partition environment?

Symptom

MPP/DPF (Data Partitioning Feature)
Following error will be thrown when db2start is issued trying to establish connection with all the nodes defined in the sqllib/db2nodes.cfg file including the new node you attempted to add.:

Eg: $home/../sqllib> db2start

10/18/2011 03:54:21 1 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.

10/18/2011 03:54:22 2 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.

10/18/2011 03:54:23 3 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.

10/18/2011 03:54:25 4 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.

10/18/2011 03:54:25 0 0 SQL1026N The database manager is already active.

SQL6032W Start command processing was attempted on "5" node(s). "0" node(s) were successfully started. "1" node(s) were already started. "4" node(s) could not be started.

db2diag.log file entries would be similar to the following when host name is not a fully qualified name:

2011-10-18-01.57.59.083502-240 E3289784E602 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 204 bytes
The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate.
DATA #2 : String, 12 bytes
/usr/bin/ssh

2011-10-18-01.57.59.083676-240 E3290387E504 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 37 bytes
myserver.spifnet.ibm.com
DATA #2 : String, 17 bytes
myserver
DATA #3 : String, 38 bytes
db2rcmd: Failed to getaddrinfo, rc -2

So issue is with getaddrinfo.
Single Partition
SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
2019-01-01-00.22.25.305962-300 E37558E422            LEVEL: Error (OS)
PID     : 10070                TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1             NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloRemoteShell, probe:50
CALLED  : OS, -, execvp                           OSERR: ENOENT (2)
MESSAGE : Error invoking remote shell program.
DATA #1 : String, 12 bytes
/usr/bin/rsh
2019-01-01-00.22.25.807918-300 E37981E388            LEVEL: Severe
PID     : 10058                TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1             NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:3599
MESSAGE : ZRC=0xFFFFFFFF=-1
DATA #1 : <preformatted>
Waitpid failure for [10070] rc was [-1], errno was 10
2019-01-01-00.22.25.808648-300 E38370E770            LEVEL: Warning
PID     : 10058                TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1             NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 348 bytes
The remote shell program terminated prematurely.  The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate.  It can also be the remote daemon is not completely started up yet to handle the request. This attempt will retry a few times before giving up.
DATA #2 : String, 12 bytes
/usr/bin/rsh
2019-01-01-00.22.25.809177-300 E39141E502            LEVEL: Error
PID     : 10058                TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1             NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 9 bytes
test.i /* Hostname test does not match the other hostname test.i */
DATA #2 : String, 9 bytes
test.i
DATA #3 : String, 51 bytes
No diagnostics available from remote shell program.

Cause

MPP/DFP Cause

The issue may be caused by one of the following reason:
  • Logical MPP instance doesn't have .rhosts file under instance HOME directory.
  • Logical MPP has .rhosts file with incorrect entries
  • A password change may be required for the instance user
  • Host name not a fully qualified name in /etc/hosts file.
Single Partition Cause
  • The hostname test does not match test.i because the hostname in ~/sqllib/db2nodes.cfg maps to an IP address belonging to network card which is down.

Diagnosing The Problem


Verify the following:

- Check if .rhosts file exist under instance HOME directory with correct entries. An alternative to using a .rhosts file is to use /etc/hosts file. The /etc/hosts file would contain the exact same entries as the .rhosts file, but must be created on each
computer.

- the node has the proper authorization defined in the .rhosts or the /etc/hosts files.

- the application is not using more than (500 + (1995 - 2 * total_number_of_nodes)) file descriptors at the same time.

- the Enterprise Server Edition environment variables are defined in the profile file.

- the profile file is written in the Korn Shell script format.

- all the host names defined in the db2nodes.cfg file in the sqllib directory are defined on the network and are running.

- the DB2FCMCOMM registry variable is set correctly.

- all the nodes defined in the sqllib/db2nodes.cfg file including the new node you attempted to add are correct.

- the server is in the same domain as the IBM® data server client

Resolving The Problem

MPP / DPF Resolution
There are three possible resolutions to the problem:

1) Create a .rhosts or /etc/hosts file with correct entries .

2) Login and change the password of the instance user .

The change password is required on servers which forces the user to set the password during first login to reinforce security. This is usually required on AIX servers .

Following are the detailed steps for resolving the issue

a) Create a .rhosts file under instance HOME directory in the following format

<hostname> <instance_Name>

Where <instance_name> is name of instance for which logical MPP is configured and hostname should be the name of DB2 server(Issue the hostname command to get the hostname)

Example:
machine1.in.ibm.com db2inst1

b) Login to the instance user and set a new password.

Example:
Step 1 : su - <instance_Name>

Step 2 : passwd

Changing password for "db2inst1"
db2inst1's Old password:
db2inst1's New password:
Enter the new password again:

After doing the suggested resolution in point (a) and (b). Please run the following command as an instance user to verify logical MPP has been configured correctly

Example:

db2inst1 is the instance configured with logical MPP.

#su - db2inst1

db2inst1@vminstlnx64test16:~/sqllib> db2_all date

Wed Aug 24 06:52:21 EDT 2011
vminstlnx64test16: date completed ok

Wed Aug 24 06:52:22 EDT 2011
vminstlnx64test16: date completed ok

The above output conveys that logical MPP is configured correctly and db2start should work fine now .

If the command "db2_all date" gives permission denied as shown below ,it means there is some configuration issues

db2inst1@vminstlnx64test16:~> db2_all date

Permission denied.

Permission denied.

db2inst5@vminstlnx64test16:~>

In case you receive a permission denied error , it means that the rsh to the host is not configured properly. This can be due to many system related issues. I am listing few of them below .

i. rsh service is not running on the system.

ii. Hostname specified in db2nodes.cfg or .rhosts is not correct.

3) if you are using /etc/hosts file, the /etc/hosts file should use the fully-qualified name. If the fully-qualified name is not used in the db2nodes.cfg file and in the/etc/hosts file, you might even receive error message SQL30082N RC=3.

Such as myserver.spifnet.ibm.com instead of myserver

For Example : change myserver to myserver.spifnet.ibm.com.
Single Partition Resolution
The output of this should match DB2SYSTEM and ~/sqllib/db2nodes.cfg
Ensure the IP address returned maps to a network card which is up and running
$ host mytest
mytest.ibm.com has address 192.1.1.55
$ ping 192.1.1.55
Bring up the network card or change hostname in ~/sqllib/db2nodes.cfg to one which exists.
Also run db2set db2system=NewHostName to match new hostname if this DB2 registry variable exists

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"Database Objects\/Config - Instance","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.7;9.5;9.1;10.5;11.1;11.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 March 2020

UID

swg21578906