stargego fail to start isflim if LSF already running

Technote (troubleshooting)


Problem(Abstract)

stargego fail to start isflim if LSF already running

Symptom

/etc/init.d/startego is run by PCMAE on compute node to start isflim, so the node can join EGO.

However, it fails to start isflim if LSF already running, although the script output shows the ego agent already started. egosh resource list will show the node unavailable.

node003: ego agent has been started.
node003: ego agent is started.
node003: Running of postscripts has completed.


# egosh resource list
NAME     status       mem    swp    tmp   ut    it    pg   r1m  r15s  r15m  ls
node001 ok         1247M  3882M    14G  29%     0   0.0   0.3   0.5   0.4   1
node002 ok          690M  1023M  4880M  11% 12984   0.0   1.0   0.3   0.3   0
node003 unavail 

Cause

startego script first checks if lim already running, and start isflim only if ps -ef | grep -c lim returns value less than or equal to 1. But if LSF lim is running, it will return 2, therefore startego will not start isflim

Diagnosing the problem

LSF lim process already running.

# ps -ef | grep lim | grep -v grep
root 2342 1 0 May21 ? 00:01:50 /usr/share/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/lim


Resolving the problem

Create a script on xCAT management node with the following codes, e.g. /tmp/startego.sh

#!/bin/sh
EGO_TOP="/opt/platform"
# set environment variables
. "/opt/platform/kernel/conf/profile.ego"
# start ego agent
THISHOSTNAME=`hostname`
export VIRTUAL_HOSTNAME="$THISHOSTNAME"
echo "Starting ego agent..."
isflim
echo "ego agent is started."
exit 0

Change file permission to 755
# chmod 755 /tmp/startego.sh

Run the script remotely on compute node
xdsh <noderange> -e /tmp/startego.sh

Rate this page:

(0 users)Average rating

Document information


More support for:

Platform Cluster Manager

Software version:

3.2, 4.1.0

Operating system(s):

Linux

Software edition:

Advanced

Reference #:

T1019604

Modified date:

2013-06-18

Translate my page

Machine Translation

Content navigation