rexec() function problem

Hi folks,

I'm trying to make a reconnection algorithm using rexec(), but I noticed that when rexec() fails returning -1, it is impossible to make it run successfully again until you restart the program or the thread.

Example, I have a endless loop for connection retries, if I supply a wrong password for a given username, rexec fails, it's ok. But my program keeps trying to connect calling rexec function, so if I change the password for that user in the host to the password that rexec is giving, it should connect successfuly, but I keep getting the same login error. The curious is if I restart my program or even if I finish the thread and start the thread again it connects succesfully.

I think it's a bug in rexec, does anyone had a problem similar to that and can help me.

Due to that problem I'm writing a code just to restart the threads that failed in rexec, but it's getting much more complex than it should be.

Thanks in advance for any help.

What OS are you using? Both SunOS and HP-UX have rexec man pages that say this:

A couple of things...

First, rexec() function call is obsolete in some
environments and you should now be using rcmd().

Second, rexec() is not safe for multithreaded
programs.

When rexec() fails, the network (socket)
connection is closed. Are you creating a new
socket connection after each failure?

I'm running Digital UNIX V4.0D (Rev. 878)

I looked at the man page for rexec and it does not say anything about thread problems, in fact, we are not having any problem using rexec with threads after rexec connects successfully. The only problem is when rexec fails to connect.

The code to connect is like:

do
{
*pifdComm =rexec((char **)&pszInitiatorIpAddress,ptServInfo->s_port,ptConfig->pacInitiatorUsername, ptConfig->pacInitiatorPasswd, ptConfig->pacInitiatorCommand,0);
if (0 > pifdComm)
{
/
Connection failure message */
}
}while(0 > *pifdComm);

I do not open any socket connection, I just call rexec and use the file descriptor returned by this function to send and receive commands from the host.

I will try this rcmd() and see if it does not have this problem, I didn't try to make it in non multithreaded application, but I'll try too.

Thanks folks, if you have any suggestions let me know.

I just tested rexec() in a non multthreaded application and I got the same problem, after it fails once, it does not connect successfully again, even after I change the password in host.

I tried rcmd() function, but it is for root users only, I got "permission denied" error.

Here is the code I wrote to test.

#include <stdio.h>
#include <netdb.h>

#define SERVER_NAME "server"
#define USER_NAME "username"
#define USER_PASSWD "password"
#define REXEC_CMD "command"

main()
{
struct servent *ptServInfo;
int ifdComm;
char szInitiatorIpAddress[128];
char *pszInitiatorIpAddress;

if ( (ptServInfo = getservbyname("exec", "tcp")) == NULL)
{
printf("\nFailed getservbyname\n");
return;
}
strcpy(szInitiatorIpAddress,SERVER_NAME);
pszInitiatorIpAddress = szInitiatorIpAddress;

do
{
ifdComm =rexec((char **)&pszInitiatorIpAddress,ptServInfo->s_port, USER_NAME, USER_PASSWD, REXEC_CMD,0);
if (0 > ifdComm)
{
printf("Connection failed\n");
}
}while( (0 > ifdComm));

printf("\nConnected successfully\n");
}

I work on the same server as yours, and the code works perfectly fine here. Just to be sure, did you check your /etc/hosts file entry to match with your SERVER_NAME? What I mean to say is, are you changing the password on the right system?

Hi Shaik,

Thanks for testing the code... I found out what is going wrong, but I didn't understand why.

When you said it worked there I tought "he must be kidding" :), because the server I wrote in SERVER_NAME parameter was right and it was in /etc/hosts file. But to make sure I was doing the right think, I added a line in the code to print the servername before calling rexec and then for my surprise I discovered what was happening.

In the first time the server was right, the one in SERVER_NAME constant, the second time the server was diferent, it was another alias name for the same IP in the /etc/hosts, but at the third time and from there on, the servername was an alias for a different IP, so that's why it never connected again, even after I changed the password.

But I didn't understand why it is getting this servername for another IP address.

Here is a piece of my /etc/hosts.

xxx.xxx.0.1 localhost
xxx.xxx.2.20 vicunha1
xxx.xxx.72.21 sscxwin1
xxx.xxx.72.22 sscsp02
xxx.xxx.72.9 sscsp11
xxx.xxx.2.24 bscs
xxx.xxx.16.34 uatr1dfr
xxx.xxx.2.56 sdrdig14
xxx.xxx.2.56 vicunha14
xxx.xxx.72.25 sscsp05.ac.com sscsp05 #System name
xxx.xxx.72.92 AC7932CY570122

I was trying to connnect to the server "sscsp05", so at the first time it pszInitiatorIpAddress was pointing to this string, the second time it has "sscsp05.ac.com" and at a third time it has "sscxwin1", a different IP address.

I tested it trying to connect to server "sscsp02", and the same thing happened, but at the second time it already got "sscsxwin1".

So, to solve this, everytime I set pszInitiatorIpAddress to point to szInititatorIpAddress again. I also tested using 10 threads, and all of them connected successfully.

Do you have any idea why rexec is getting other server names in /etc/hosts?? I looked at the man page and it doesn't say anything about getting other servernames dinamically.

Thank you.