IBM Support

The DataStage command "orchadmin check" fails on remote node.

Troubleshooting


Problem

When running the Orchadmin Check command and using an apt configuration file which includes remote nodes, the command failed with error: A remote host refused an attempted connect operation

Symptom

The full error reported by orchadmin check is:

$APT_ORCHHOME/bin/orchadmin check
##I IIS-DSEE-TFCN-00001 06:45:07(000) <main_program>
IBM WebSphere DataStage Enterprise Edition 8.5.0.6152
Copyright (c) 2001, 2005-2008 IBM Corporation. All rights reserved

##I IIS-DSEE-TFCN-00006 06:45:07(001) <main_program> conductor uname: - s=AIX; -r=1; -v=6; -n=xxxxxxxxxx; -m=00C594C24C00
##I IIS-DSEE-TCOA-00067 06:45:07(002) <main_program> OS charset: ISO-8859-1.
##I IIS-DSEE-TCOA-00068 06:45:07(003) <main_program> Input charset: UTF-8.
##I IIS-DSEE-TFSC-00001 06:45:07(004) <main_program> APT configuration file: /opt/IBM/InformationServer/Server/PXEngine/etc/config.apt
##W IIS-DSEE-TFPM-00152 06:45:42(000) <main_program> Accept timed out retries = 8
tss2a118n2.svr.us.servername.net: A remote host refused an attempted connect operation.
##W IIS-DSEE-TFPM-00152 06:46:12(000) <main_program> Accept timed out retries = 7
##E IIS-DSEE-TFPM-00153 06:46:12(001) <main_program> The section leader on xxxxxxxxxx died
##E IIS-DSEE-TFPM-00356 06:46:12(002) <main_program>

**** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.

##I IIS-DSEE-TFPM-00177 06:46:12(003) <main_program> Step started on
node xxxxxxxxxx; it uses 2 nodes.
The program running the step is /opt/IBM/InformationServer/Server/PXEngine/bin/orchadmin.
##I IIS-DSEE-TFPM-00178 06:46:12(004) <main_program> The ORCHESTRATE startup program in
/opt/IBM/InformationServer/Server/PXEngine/etc/standalone.sh is being used.
##I IIS-DSEE-TFPM-00181 06:46:12(005) <main_program> A startup script is not being used.
##I IIS-DSEE-TFPM-00183 06:46:12(006) <main_program> The TCP port being used for startup is 10,002; the associated socket number is 3.
##I IIS-DSEE-TFPM-00184 06:46:12(007) <main_program> Node status:

##I IIS-DSEE-TFPM-00185 06:46:12(008) <main_program> xxxxxxxxxx -
##I IIS-DSEE-TFPM-00186 06:46:12(009) <main_program> OK
##I IIS-DSEE-TFPM-00185 06:46:12(010) <main_program> xxxxxxxxxx -
##I IIS-DSEE-TFPM-00187 06:46:12(011) <main_program> rsh issued, no response received

##E IIS-DSEE-TFPM-00247 06:46:12(012) <main_program> Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.
##E IIS-DSEE-TFSC-00011 06:46:12(013) <main_program> Step execution finished with status = FAILED.
##E IIS-DSEE-TCOA-00069 06:46:12(014) <main_program> ERROR: check configuration file failed.

Resolving The Problem

DataStage jobs or commands utilizing remote nodes requires that either rsh or ssh be configured to allow a passwordless connection to the rote system to spawn jobs.

When using ssh, it is setup via the remsh file in the $APT_ORCHHOME/etc/remsh file. The product install manual discusses creating this file if it does not exist.
Confirm that the permissions on the remsh file are at least 755, to ensure that any the file can be read by any process which needs it.

Also, test connection to the remote server to confirm that rsh or ssh has been setup correctly, for example:
ssh remote_hostname date
or
rsh remote_hostname date

If the correct above command does not work, then rsh/ssh communication has not been correctly configured on the conductor node, or access is blocked, possibly by a firewall.


The following additional technotes have information which may be relevant to this issue:

Connection refused during DataStage job startup

A DataStage parallel job running on multiple nodes on a single server machine fails with error "**** Parallel startup failed ****"

DataStage Parallel Jobs Fail with errors on ssh_exchange_identification, rsh, or Section Leaders

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"}],"Version":"9.1;8.7;8.5;8.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21645965