IBM Support

stargego fail to start isflim if LSF already running

Technote (troubleshooting)


stargego fail to start isflim if LSF already running


/etc/init.d/startego is run by PCMAE on compute node to start isflim, so the node can join EGO.

However, it fails to start isflim if LSF already running, although the script output shows the ego agent already started. egosh resource list will show the node unavailable.

node003: ego agent has been started.
node003: ego agent is started.
node003: Running of postscripts has completed.

# egosh resource list
NAME     status       mem    swp    tmp   ut    it    pg   r1m  r15s  r15m  ls
node001 ok         1247M  3882M    14G  29%     0   0.0   0.3   0.5   0.4   1
node002 ok          690M  1023M  4880M  11% 12984   0.0   1.0   0.3   0.3   0
node003 unavail 


startego script first checks if lim already running, and start isflim only if ps -ef | grep -c lim returns value less than or equal to 1. But if LSF lim is running, it will return 2, therefore startego will not start isflim

Diagnosing the problem

LSF lim process already running.

# ps -ef | grep lim | grep -v grep
root 2342 1 0 May21 ? 00:01:50 /usr/share/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/lim

Resolving the problem

Create a script on xCAT management node with the following codes, e.g. /tmp/

# set environment variables
. "/opt/platform/kernel/conf/profile.ego"
# start ego agent
echo "Starting ego agent..."
echo "ego agent is started."
exit 0

Change file permission to 755
# chmod 755 /tmp/

Run the script remotely on compute node
xdsh <noderange> -e /tmp/

Cross reference information
Segment Product Component Platform Version Edition
IBM Spectrum Computing IBM Spectrum Cluster Foundation

Document information

More support for: Platform Cluster Manager

Software version: 3.2, 4.1.0

Operating system(s): Linux

Software edition: Advanced

Reference #: T1019604

Modified date: 18 June 2013