Troubleshooting
Problem
How to resolve db2start failing with SQL6048N error in MPP /DPF or single partition environment?
Symptom
MPP/DPF (Data Partitioning Feature)
Following error will be thrown when db2start is issued trying to establish connection with all the nodes defined in the sqllib/db2nodes.cfg file including the new node you attempted to add.:
Eg: $home/../sqllib> db2start
10/18/2011 03:54:21 1 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:22 2 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:23 3 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:25 4 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:25 0 0 SQL1026N The database manager is already active.
SQL6032W Start command processing was attempted on "5" node(s). "0" node(s) were successfully started. "1" node(s) were already started. "4" node(s) could not be started.
db2diag.log file entries would be similar to the following when host name is not a fully qualified name:
2011-10-18-01.57.59.083502-240 E3289784E602 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 204 bytes
The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate.
DATA #2 : String, 12 bytes
/usr/bin/ssh
2011-10-18-01.57.59.083676-240 E3290387E504 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 37 bytes
myserver.spifnet.ibm.com
DATA #2 : String, 17 bytes
myserver
DATA #3 : String, 38 bytes
db2rcmd: Failed to getaddrinfo, rc -2
So issue is with getaddrinfo.
Following error will be thrown when db2start is issued trying to establish connection with all the nodes defined in the sqllib/db2nodes.cfg file including the new node you attempted to add.:
Eg: $home/../sqllib> db2start
10/18/2011 03:54:21 1 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:22 2 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:23 3 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:25 4 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
10/18/2011 03:54:25 0 0 SQL1026N The database manager is already active.
SQL6032W Start command processing was attempted on "5" node(s). "0" node(s) were successfully started. "1" node(s) were already started. "4" node(s) could not be started.
db2diag.log file entries would be similar to the following when host name is not a fully qualified name:
2011-10-18-01.57.59.083502-240 E3289784E602 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 204 bytes
The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate.
DATA #2 : String, 12 bytes
/usr/bin/ssh
2011-10-18-01.57.59.083676-240 E3290387E504 LEVEL: Error
PID : 7397 TID : 47343526210544PROC : db2start
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand,
probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 37 bytes
myserver.spifnet.ibm.com
DATA #2 : String, 17 bytes
myserver
DATA #3 : String, 38 bytes
db2rcmd: Failed to getaddrinfo, rc -2
So issue is with getaddrinfo.
Single Partition
SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
2019-01-01-00.22.25.305962-300 E37558E422 LEVEL: Error (OS)
PID : 10070 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloRemoteShell, probe:50
CALLED : OS, -, execvp OSERR: ENOENT (2)
MESSAGE : Error invoking remote shell program.
DATA #1 : String, 12 bytes
/usr/bin/rsh
PID : 10070 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloRemoteShell, probe:50
CALLED : OS, -, execvp OSERR: ENOENT (2)
MESSAGE : Error invoking remote shell program.
DATA #1 : String, 12 bytes
/usr/bin/rsh
2019-01-01-00.22.25.807918-300 E37981E388 LEVEL: Severe
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:3599
MESSAGE : ZRC=0xFFFFFFFF=-1
DATA #1 : <preformatted>
Waitpid failure for [10070] rc was [-1], errno was 10
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:3599
MESSAGE : ZRC=0xFFFFFFFF=-1
DATA #1 : <preformatted>
Waitpid failure for [10070] rc was [-1], errno was 10
2019-01-01-00.22.25.808648-300 E38370E770 LEVEL: Warning
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 348 bytes
The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate. It can also be the remote daemon is not completely started up yet to handle the request. This attempt will retry a few times before giving up.
DATA #2 : String, 12 bytes
/usr/bin/rsh
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:110
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 348 bytes
The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate. It can also be the remote daemon is not completely started up yet to handle the request. This attempt will retry a few times before giving up.
DATA #2 : String, 12 bytes
/usr/bin/rsh
2019-01-01-00.22.25.809177-300 E39141E502 LEVEL: Error
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 9 bytes
test.i /* Hostname test does not match the other hostname test.i */
DATA #2 : String, 9 bytes
test.i
DATA #3 : String, 51 bytes
No diagnostics available from remote shell program.
PID : 10058 TID : 139872053577600 PROC : db2start
INSTANCE: db2inst1 NODE : 000
HOSTNAME: test
FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:200
MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
DATA #1 : String, 9 bytes
test.i /* Hostname test does not match the other hostname test.i */
DATA #2 : String, 9 bytes
test.i
DATA #3 : String, 51 bytes
No diagnostics available from remote shell program.
Cause
MPP/DFP Cause
The issue may be caused by one of the following reason:
The issue may be caused by one of the following reason:
- Logical MPP instance doesn't have .rhosts file under instance HOME directory.
- Logical MPP has .rhosts file with incorrect entries
- A password change may be required for the instance user
- Host name not a fully qualified name in /etc/hosts file.
Single Partition Cause
- The hostname test does not match test.i because the hostname in ~/sqllib/db2nodes.cfg maps to an IP address belonging to network card which is down.
Diagnosing The Problem
Verify the following:
- Check if .rhosts file exist under instance HOME directory with correct entries. An alternative to using a .rhosts file is to use /etc/hosts file. The /etc/hosts file would contain the exact same entries as the .rhosts file, but must be created on each
computer.
- the node has the proper authorization defined in the .rhosts or the /etc/hosts files.
- the application is not using more than (500 + (1995 - 2 * total_number_of_nodes)) file descriptors at the same time.
- the Enterprise Server Edition environment variables are defined in the profile file.
- the profile file is written in the Korn Shell script format.
- all the host names defined in the db2nodes.cfg file in the sqllib directory are defined on the network and are running.
- the DB2FCMCOMM registry variable is set correctly.
- all the nodes defined in the sqllib/db2nodes.cfg file including the new node you attempted to add are correct.
- the server is in the same domain as the IBM® data server client
Resolving The Problem
MPP / DPF Resolution
There are three possible resolutions to the problem:
1) Create a .rhosts or /etc/hosts file with correct entries .
2) Login and change the password of the instance user .
The change password is required on servers which forces the user to set the password during first login to reinforce security. This is usually required on AIX servers .
Following are the detailed steps for resolving the issue
a) Create a .rhosts file under instance HOME directory in the following format
<hostname> <instance_Name>
Where <instance_name> is name of instance for which logical MPP is configured and hostname should be the name of DB2 server(Issue the hostname command to get the hostname)
Example:
machine1.in.ibm.com db2inst1
b) Login to the instance user and set a new password.
Example:
Step 1 : su - <instance_Name>
Step 2 : passwd
Changing password for "db2inst1"
db2inst1's Old password:
db2inst1's New password:
Enter the new password again:
After doing the suggested resolution in point (a) and (b). Please run the following command as an instance user to verify logical MPP has been configured correctly
Example:
db2inst1 is the instance configured with logical MPP.
#su - db2inst1
db2inst1@vminstlnx64test16:~/sqllib> db2_all date
Wed Aug 24 06:52:21 EDT 2011
vminstlnx64test16: date completed ok
Wed Aug 24 06:52:22 EDT 2011
vminstlnx64test16: date completed ok
The above output conveys that logical MPP is configured correctly and db2start should work fine now .
If the command "db2_all date" gives permission denied as shown below ,it means there is some configuration issues
db2inst1@vminstlnx64test16:~> db2_all date
Permission denied.
Permission denied.
db2inst5@vminstlnx64test16:~>
In case you receive a permission denied error , it means that the rsh to the host is not configured properly. This can be due to many system related issues. I am listing few of them below .
i. rsh service is not running on the system.
ii. Hostname specified in db2nodes.cfg or .rhosts is not correct.
3) if you are using /etc/hosts file, the /etc/hosts file should use the fully-qualified name. If the fully-qualified name is not used in the db2nodes.cfg file and in the/etc/hosts file, you might even receive error message SQL30082N RC=3.
Such as myserver.spifnet.ibm.com instead of myserver
For Example : change myserver to myserver.spifnet.ibm.com.
There are three possible resolutions to the problem:
1) Create a .rhosts or /etc/hosts file with correct entries .
2) Login and change the password of the instance user .
The change password is required on servers which forces the user to set the password during first login to reinforce security. This is usually required on AIX servers .
Following are the detailed steps for resolving the issue
a) Create a .rhosts file under instance HOME directory in the following format
<hostname> <instance_Name>
Where <instance_name> is name of instance for which logical MPP is configured and hostname should be the name of DB2 server(Issue the hostname command to get the hostname)
Example:
machine1.in.ibm.com db2inst1
b) Login to the instance user and set a new password.
Example:
Step 1 : su - <instance_Name>
Step 2 : passwd
Changing password for "db2inst1"
db2inst1's Old password:
db2inst1's New password:
Enter the new password again:
After doing the suggested resolution in point (a) and (b). Please run the following command as an instance user to verify logical MPP has been configured correctly
Example:
db2inst1 is the instance configured with logical MPP.
#su - db2inst1
db2inst1@vminstlnx64test16:~/sqllib> db2_all date
Wed Aug 24 06:52:21 EDT 2011
vminstlnx64test16: date completed ok
Wed Aug 24 06:52:22 EDT 2011
vminstlnx64test16: date completed ok
The above output conveys that logical MPP is configured correctly and db2start should work fine now .
If the command "db2_all date" gives permission denied as shown below ,it means there is some configuration issues
db2inst1@vminstlnx64test16:~> db2_all date
Permission denied.
Permission denied.
db2inst5@vminstlnx64test16:~>
In case you receive a permission denied error , it means that the rsh to the host is not configured properly. This can be due to many system related issues. I am listing few of them below .
i. rsh service is not running on the system.
ii. Hostname specified in db2nodes.cfg or .rhosts is not correct.
3) if you are using /etc/hosts file, the /etc/hosts file should use the fully-qualified name. If the fully-qualified name is not used in the db2nodes.cfg file and in the/etc/hosts file, you might even receive error message SQL30082N RC=3.
Such as myserver.spifnet.ibm.com instead of myserver
For Example : change myserver to myserver.spifnet.ibm.com.
Single Partition Resolution
The output of this should match DB2SYSTEM and ~/sqllib/db2nodes.cfg
Ensure the IP address returned maps to a network card which is up and running
$ host mytest
mytest.ibm.com has address 192.1.1.55
mytest.ibm.com has address 192.1.1.55
$ ping 192.1.1.55
Bring up the network card or change hostname in ~/sqllib/db2nodes.cfg to one which exists.
Also run db2set db2system=NewHostName to match new hostname if this DB2 registry variable exists
Related Information
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"Database Objects\/Config - Instance","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.7;9.5;9.1;10.5;11.1;11.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Was this topic helpful?
Document Information
Modified date:
04 March 2020
UID
swg21578906