Remote login and running a script on multiple servers

weddy · March 5, 2014, 8:44pm

Hi all,
I am baffled on this. Solaris Irix system.
I have 4 servers all connected to one another, I need to write a script line that would login on to server 1-3 ($HOST) start a script in the back ground and log off while the back ground script runs over a length of time.

say a script that has

pmem >> outputfile.txt

more lines yet but this output to file on the same server.

Now we can rlogin and start the script the script uses outputs and input feeds that makes this difficult to deal with due to that the log off turns off the script.:rolleyes: i dont get it either but somehow it does.

restrictions:
I am only allowed to use tsch, sh (preferred :D) I can't use ssh its not implemented in the servers nor can it get implemented :(. I cant use or pass to, on any script, ip address, passwords or user names:mad:. They are linked I can use rlogin hostname to get in any machine vita xterm. This is not a new setup its been running for years. :wall:

idea has been so far but Im new at this so :rolleyes:

rsh $HOST 'nohup $process.sh >/dev/null > 2>&1 &'

this line does not work

thanks in advance:p, I will try to get back here as much as possible to answer questions.

Don_Cragun · March 5, 2014, 10:31pm

And when you run the command:

rsh $HOST 'nohup $process.sh >/dev/null > 2>&1 &'

what on the system identified by $HOST is going to define the variable named process ?

If you want to know what is going wrong, why are you redirecting all diagnostic messages to /dev/null?

weddy · March 5, 2014, 10:56pm

Hi Don,
What we are trying to create is a master script to run other scripts in the back ground on different machines.

to answer your questions

$host

are the host names of the 3 machines ws1 ws2 ws3. and

$process.sh

is the script name located on the NIS server which is the 4th machine making the initial call script. So all machines can see and access the script and its not written on each machine saving up needed space.

hope that answered the questions: now I have got some interesting results with the code for rsh

host names are config with Csh which is stty not supported for transit. (not sure what I just said but hope it helps)
and i have a ambiguous output redirect.

IE @Don you mentioned the direct to the null. Short answer is we don't care what the output is from the call, the script outputs to a different files those files are what we are after. But I think your right about having it directed to that null.

The rsh call stops and on the ws4 and creates a PID and waits until never (endless loop in script tracking memory) have to suspend and kill pid to stop it. again its not the problem but it stopping in the master script is.

any ideas I can research in to please throw them at me too. Thanks OJT scripting got to love it.

Don_Cragun · March 5, 2014, 11:37pm

You didn't even come close to answering my question. In the script:

rsh $HOST 'nohup $process.sh >/dev/null > 2>&1 &'

The rsh utility runs the command nohup $process.sh >/dev/null > 2>&1 & on the host specified by the expansion of $HOST . Since the string $process is given to rsh inside single quotes, $process will be expanded on the system specified by the expansion of $HOST ; not on the system where you run the rsh command.

I repeat my question: What defines the value that $process will expand to on the hosts ws1, ws2, and ws3.

If process is being defined on the system where you're running rsh and not on ws1, ws2, and ws3, the command you're running on those systems is:

nohup .sh >/dev/null > 2>&1 &

Do you have any proof that the program that you want to run on ws1, ws2, and ws3 has been started?

Try running:

rsh $HOST 'nohup $process.sh >/dev/null > &'

which is likely to be a syntax error, but now you stand a chance of seeing the syntax error. Then try running:

rsh $HOST 'nohup $process.sh >/dev/null &'

which doesn't have any syntax errors, but I'm expecting you'll see an error message something like:

.sh not found

Then try running:

rsh $HOST "nohup $process.sh >/dev/null > 2>&1 &"

which will expand both $HOST and $process on the local system.

weddy · March 6, 2014, 8:39am

Hi Don,

so the

$process

is the script .sh,

$process.sh

I think your mistaken the variable script name for a name variable and then a script. they are both the same. of course i could put the hard coded name in there but its a script running it and using proper programming you should use variables when or if it changes. sorry for any miss understandings.

correct this is what we want to be done. one script telling another script to run in the back ground of another machine and not on the originator.

yes they are all in the same room and i can just run

ps -ef | grep  <process name>

on each of the work stations 1-3, addition to this I can run the rsh through the xterm however it will hang open on the main computer/ ws4 (work station 4) which will stop after ws1 and not run the script on ws2 or ws3.

I'm confused with this can you elaborate. why will it run on the local system or will the process call run on the local system while the process script runs on the host system.

thanks

Don_Cragun · March 6, 2014, 4:03pm

Let me try once more.

Text inside single quotes such as $process in the command:

rsh $HOST 'nohup $process.sh ... &'

will not be expanded on the local system.

HOST=ws1
process=abc
rsh $HOST 'nohup $process.sh >/dev/null &'

will run the command:

nohup $process.sh > /dev/null &

on ws1. However, if you use double quotes instead of single quotes:

HOST=ws1
process=abc
rsh $HOST "nohup $process.sh >/dev/null &"

will run the command:

nohup abc.sh > /dev/null &

on ws1.

I repeat for the last time: Since you are using single quotes, how is the variable process defined on the remote systems so that it will be an exported variable in the environment of the shell started by rsh on those remote systems?

ahmedwaseem2000 · March 6, 2014, 5:46pm

don cragun:

You didn't even come close to answering my question. In the script:
rsh $HOST 'nohup $process.sh >/dev/null > 2>&1 &'
The rsh utility runs the command nohup $process.sh >/dev/null > 2>&1 & on the host specified by the expansion of $HOST . Since the string $process is given to rsh inside single quotes, $process will be expanded on the system specified by the expansion of $HOST ; not on the system where you run the rsh command.

I repeat my question: What defines the value that $process will expand to on the hosts ws1, ws2, and ws3.

If process is being defined on the system where you're running rsh and not on ws1, ws2, and ws3, the command you're running on those systems is:
nohup .sh >/dev/null > 2>&1 &
Do you have any proof that the program that you want to run on ws1, ws2, and ws3 has been started?

Try running:
rsh $HOST 'nohup $process.sh >/dev/null > &'
which is likely to be a syntax error, but now you stand a chance of seeing the syntax error. Then try running:
rsh $HOST 'nohup $process.sh >/dev/null &'
which doesn't have any syntax errors, but I'm expecting you'll see an error message something like:
.sh not found
Then try running:
rsh $HOST "nohup $process.sh >/dev/null > 2>&1 &"
which will expand both $HOST and $process on the local system.

Thats a good explanation to make people learn. seldom find people trying to help learn

weddy · March 6, 2014, 6:14pm

running on script is $HOST is short for the workstation${i} in a for loop where i =1 counting i<4
process is the variable name of process=/a path name/script name.sh

the problem is that Csh does not support stty operations on transit and
the output is ambiguous.
I see what you mean by the single and double quotes. I have to see what its doing.
thanks,
sorry i guess i didnt quite understand when you were talking about expanding. get back with ya on results and what i find out.

weddy · March 8, 2014, 6:15pm

It does not work correctly the way I want it to work. Csh is doing what it does best is keep the script running when you logout of the xterm. Plus I don't think you need the extra baggage of 2>&1 to get this accomplished. You get 3 dtexec pid on the local machine which in my case is undesirable and unwanted. That said; the process does start the remote systems with the script running in the back ground with the extra load of the dtexec process zombie.

Thanks Don for the help, I will have to figure something else out or a way to get rid of the zombies ("walkers").

below is the code I wanted to use but only tried the rsh segment to get results. Please note that there are missing segments due to:b: company rules.

#!/bin/sh

# this is to start a process remotely and kill the process remotely

ScriptAndPath= #left out see below

# add insurance that $USER is ***
if [ $USER != $USER ]; then
    echo "You must be *** to run correctly with start & stop"
    exit
fi
#
# intentionally left out due to community overview.


case $1 in
    'start')
        for i in 1 2 3
            do
                rsh ws${i}host "${ScriptAndPath} 2&>1 &"
            done

        $ScriptAndPath #start on local system
    ;;

    'stop')

    # Search for processes that you've started based on the file write path
        for i in 1 2 3 4
        do
            rlogin ws${1}host
            sleep 8 #sleep for 8 sec
                # this should kill all process PID's for both script and dtexec
                for KILLPID in `ps -ef | grep proc | awk '{print $2}'`; do
                    kill -9 $KILLPID
                done
            exit
            sleep 8
        done            
    ;;
    
    *)
        echo "USAGE: $0 <start|stop>"
    ;;
esac

Don_Cragun · March 9, 2014, 6:40am

There are a couple of strange things in this loop in the 'stop' case in your code:

    # Search for processes that you've started based on the file write path
        for i in 1 2 3 4
        do
            rlogin ws${1}host
            sleep 8 #sleep for 8 sec
                # this should kill all process PID's for both script and dtexec
                for KILLPID in `ps -ef | grep proc | awk '{print $2}'`; do
                    kill -9 $KILLPID
                done
            exit
            sleep 8
        done            
    ;;

In your 'start' case, you started processes on ws1host , ws2host , ws3host , and the local system. But here in your 'stop' case, you seem to want to rlogin to wsstophost four times and manually enter commands to be run on that host. You haven't told us what you typed into the first rlogin session, but the exit in the loop after the first rlogin session will terminate your script.

Assuming that the local system is ws4host , wouldn't the following come closer to doing what you said you want to do:

    # Search for processes that you've started based on the file write path
        for i in 1 2 3 4
        do
            rlogin ws${i}host <<"EOF"
                sleep 8 #sleep for 8 sec # I see no need for this line
                # this should kill all process PID's for both script and dtexec
                for KILLPID in `ps -ef | grep proc | awk '{print $2}'`; do
                    kill -9 $KILLPID
                done
                exit
EOF
            sleep 8 # I see no need for this line
        done            
    ;;

Changing ${1} to ${i} would seem to be crucial. Using a here-document to feed commands into rlogin would seem to be crucial. Trying to kill off every command on the system that happens to match the string proc seems dangerous. (Are you really sure that this string won't occur in the ps output from other processes that you don't intend to kill off?)

weddy · March 9, 2014, 8:52am

Hi Don,

The first one was a typo, nice catch. I changed it on the original document just in case I ever want to go back and use it as reference.

I cant put specific names out on public forums. It is a script and its path so you get a PID for the script. I am trying to now stop this script vita the PID
The exit command was for the rlogin but your right haven't thought that far in the script. I added the rest to post it for people can at least see what were talking about.

I have to do some reading on this. Good tip thanks.

sleep 8 # I see no need for this line

this allows the connection to become stable before sending commands, there are other process that have to preform there task before you can send a task/command; for functional testing 8 is standard here don't ask me why I don't know.

it works for testing for now and remember I cant post names, when i get more results on what starts up Ill get more specific. I can see it starts a bunch of daemon threads and its very undesirable it was ment to kill them as well dtexec threads. I believe i started with

pidof proc<name>

command but it did not show the daemon threads.

thanks again Don you are a great monitor and mentor.

As you can see testing is everything when you get to play with this guy