AIX calling WINSOCK during e-mail - normal?

ctote · January 20, 2011, 2:33pm

Hey everyone,

I'm completely stumped on this. An AIX machine I'm working on is attempting to send email, but the SMTP connection is failing. I have no idea what this code does or if it should even work. If someone could give me a hand, or a suggestion on what else to use, I would appreciate it.

retval = connect (the_socket, (struct sockaddr *) & sa_in,
                     sizeof(struct sockaddr_in));

I cannot debug this - when I step into connect() it appears to step-over.

#if INCL_WINSOCK_API_PROTOTYPES
WINSOCK_API_LINKAGE
int
WSAAPI
connect(
    IN SOCKET s,
    __in_bcount(namelen) const struct sockaddr FAR * name,
    IN int namelen
    );
#endif /* INCL_WINSOCK_API_PROTOTYPES */

citaylor · January 20, 2011, 2:48pm

I would try printing the value of "errno" to find out why the socket didnt connect.
I would also have a look at "the_socket" to make sure it is >= 0.
Then I would also look at the contents of sa_in (which should be a struct sockaddr_in)
My guess from the WINSOCK comment is that this is a ported windows application. As such I believe they can be different in the way that the sa_in is formed (intel is little endian and power is big endian, so for example a missing "htons" on the port or something like that could make a difference on only one platform)

I hope this helps...

ctote · January 20, 2011, 2:53pm

Ok, I'll do that. I know the_socket was 19. Do you know of another way I can use this data to connect to an SMTP server? Gives me an errno of 78

Corona688 · January 20, 2011, 3:22pm

None of that winsock stuff matters unless INCL_WINSOCK_API_PROTOTYPES is defined -- it's probably safe to ignore. Windows does in fact have a socket API that vaguely resembles the expected standard, so the code could have been ported as citaylor noted...

After it fails you can try perror("connection failed"); and it should print something like connection failed: no route to host or whatever to stderr.

connect() is usually a system call, not a function, meaning there's nothing happening inside your process to trace during connect() -- it's asleep. Some special instruction or software interrupt has transferred control to the kernel, which will wake it back up when its done.

For more detail on the connect call, try man 2 connect

---------- Post updated at 02:22 PM ---------- Previous update was at 02:06 PM ----------

I found this example for Linux, I think it should at be very close.

Here it is cleaned up and simplified a little.

    const char *host="smtp.whatever.com";
    short int port=25; // standard smtp port
    int sockfd, n;
    struct sockaddr_in serv_addr;
    struct hostent *server;

    sockfd=socket(AF_INET, SOCK_STREAM, 0);
    server=gethostbyname(host); // look up host's IP address

    // copy the IP address and port into the address structure
    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family=AF_INET;
    bcopy(server->h_addr,&serv_addr.sin_addr.s_addr, server->h_length);
    serv_addr.sin_port=htons(port); // htons() puts it in the right byte order
    connect(sockfd, &serv_addr, sizeof(serv_addr));

ctote · January 20, 2011, 3:40pm

Amazing. Thanks for the help! I'm sure I'll have more questions, but this will definitely get me started!

---------- Post updated at 03:40 PM ---------- Previous update was at 03:30 PM ----------

What does connect() return? Is there somewhere I can view the possible values?

Scott · January 20, 2011, 3:56pm

Hi.

From the connect(2) man-page:

RETURN VALUES
     Upon successful completion, a value of 0 is returned.  Otherwise, a value of -1 is returned and the global inte-
     ger variable errno is set to indicate the error.

Corona688 · January 20, 2011, 4:11pm

Yes, and the perror() command I suggested above checks errno to tell what it should print. errno is a global variable that's set by system calls, so when most any system call returns an error, perror can tell you what it is.

ctote · January 20, 2011, 4:55pm

Thanks for the help guys.

bcopy() doesn't show up as a usable method - I did find a reference to memcpy, but I get an invalid void 1st parameter error. Any idea what to try? I made sure I had #include <strings.h>

Corona688 · January 20, 2011, 5:03pm

bcopy works like bcopy(source, dest, length);
memcpy works like memcpy(dest, source, length);

So you have to swap the first two. Otherwise they're equivalent. I considered replacing that for you but decided I shouldn't mess with the original code too much. Sorry.

"invalid void first parameter" is a new one on me. functions like memcpy and bcopy usually take void * so you can feed them any kind of pointer without the compiler whining. I suspect something else is wrong, maybe it doesn't like how I'm using the structure members. Can you print the exact line you have and the exact error you get?

[edit] It's string.h, not strings.h. That might do it.

ctote · January 20, 2011, 5:09pm

Sure it's:

Going to try switching the params now.

edit: Got the same error. Here's my line of code:

memcpy(&sa_in.sin_addr, hostentry->h_addrtype, hostentry->h_length);

Corona688 · January 20, 2011, 5:21pm

You forgot to give me the entire line that caused the error, too. None of that should have made it think it was an int, that's odd.

---------- Post updated at 04:21 PM ---------- Previous update was at 04:11 PM ----------

A little bit fell off somewhere or other.

memcpy(&serv_addr.sin_addr.s_addr, hostentry->h_addr, hostentry->h_length);

ctote · January 20, 2011, 5:27pm

Well, now I'm getting errors like:

But I can clearly see (on my windows box) that sa_in.sin_addr.S_un.S_addr exists. Here's my new line of code:

memcpy(&sa_in.sin_addr.S_un.S_addr, hostentry->h_addrtype, hostentry->h_length);

---------- Post updated at 05:25 PM ---------- Previous update was at 05:22 PM ----------

Ok I tried:

memcpy(&sa_in.sin_addr.s_addr, hostentry->h_addrtype, hostentry->h_length);

and now I get:

again. On my windows box, s_addr shows up as 364431568

---------- Post updated at 05:27 PM ---------- Previous update was at 05:25 PM ----------

heyooo, I think it worked this time. I tried h_addr instead of h_addrtype like you suggested, and it took. I get this now, but I don't think it's anything to be concerned about?

Corona688 · January 20, 2011, 5:29pm

Never use Windows documentation to program an AIX system! When I said it only vaguely adhered to the standard, I wasn't kidding. It's a bit skewed compared to what you get on a UNIX system.

Do you still have it as strings.h instead of string.h? That might cause that.

ctote · January 20, 2011, 5:33pm

Those were included before I started changing anything - if this works (I pray it does) I plan on going back and doing a bit of cleanup in this file. Hard to tell how old it is. I'll let you know how the results turn out (should know in the next hour or so)!

Corona688 · January 20, 2011, 6:03pm

If you're using memcpy but not including string.h, that is easily capable of causing a segmentation fault on some platforms.

Turns out there is a strings.h though, it defines bcopy.

ctote · January 21, 2011, 12:03pm

I didn't recompile everything I should have - going to have to try again tomorrow. It will take another hour or so to shutdown the servers, recompile, and reboot them. I'll keep you updated though. Thanks again for the help!

---------- Post updated 01-21-11 at 11:51 AM ---------- Previous update was 01-20-11 at 06:54 PM ----------

So unfortunately, I'm still getting a -1 returned from connect(). Here's my full code:

int
connection::get_connected (char * hostname, char * service)
{
   struct hostent *    hostentry; /* from gethostbyname */
   struct servent *    serventry; /* from getservbyname */
   unsigned long ip_address;
   struct sockaddr_in    sa_in;
   int    our_port;
   struct linger NoLinger;
   int    retval, err_code;
   unsigned long    ioctl_blocking = 1;
   char    message[512];

   // if the ctor couldn't get a buffer
   if (!in_buffer || !out_buffer)
      return (ERR_CANT_MALLOC);

   // --------------------------------------------------
   // resolve the service name
   //

   // If they've specified a number, just use it.
   if (gensock_is_a_number (service))
   {
      char * tail;
      our_port = (int) strtol (service, &tail, 10);
      if (tail == service)
      {
         return (ERR_CANT_RESOLVE_SERVICE);
      }
      else
      {
         our_port = htons (our_port);
      }
   }
   else
   {
      // we have a name, we must resolve it.
      serventry = getservbyname (service, "tcp");

      if (serventry)
         our_port = serventry->s_port;
      else
         return (ERR_CANT_RESOLVE_SERVICE);
   }

   // --------------------------------------------------
   // resolve the hostname/ipaddress
   //
   // Assume only hostname
   //  if ((ip_address = inet_addr (hostname)) != INADDR_NONE) {
   //    sa_in.sin_addr.s_addr = ip_address;
   // }
   //  else {
   if ((hostentry = gethostbyname(hostname)) == NULL)
   {
      return (ERR_CANT_RESOLVE_HOSTNAME);
   }
   sa_in.sin_addr.s_addr = *(long *)hostentry->h_addr;
   // }


   // --------------------------------------------------
   // get a socket
   //

   if ((the_socket = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
   {
      return (ERR_CANT_GET_SOCKET);
   }

   sa_in.sin_family = AF_INET;
   sa_in.sin_port = our_port;

   // set socket options.  DONTLINGER will give us a more graceful disconnect

   NoLinger.l_onoff = 0;
   setsockopt(the_socket,
              SOL_SOCKET,
              SO_LINGER,
              (char *) &NoLinger, sizeof(NoLinger));

   // get a connection

   memset(&sa_in, 0, sizeof(sa_in));
   memcpy(&sa_in.sin_addr.s_addr, hostentry->h_addr, hostentry->h_length);
   sa_in.sin_port=htons(our_port);
   retval = connect (the_socket, (struct sockaddr *) & sa_in,
                     sizeof(sa_in));

   if (retval == SOCKET_ERROR)
   {
      return (ERR_CANT_CONNECT);
   }


#ifdef HA
   // Make this a non-blocking socket
   fcntl (the_socket, F_SETFL, O_NDELAY);
   // make the FD_SET and timeout structures for later operations...
#endif

   FD_ZERO (&fds);
   FD_SET (the_socket, &fds);

   // normal timeout, can be changed by the wait option.
   timeout.tv_sec = 0;
   timeout.tv_usec = 0;

   return (0);
}

I'm lost at this point.

---------- Post updated at 12:03 PM ---------- Previous update was at 12:02 PM ----------

I should mention also that when I try to do 'print hostentry->h_addr' it says that's not a valid member... but it compiles fine with gmake. So I'm not sure.

Corona688 · January 21, 2011, 3:12pm

I am also mystified. We've given you several methods of getting the actual error but you haven't used them. Until you do, your guess is as good as mine.

ctote · January 21, 2011, 3:57pm

Oh shucks, it must have gotten lost somewhere along the way. When it print errno, I get 78. Or are you referring to something else?

---------- Post updated at 03:57 PM ---------- Previous update was at 03:19 PM ----------

Now I'm getting this errno for some reason:
ENOTEMPTY

Corona688 · January 21, 2011, 8:12pm

that means 'directory not empty'. It's likely totally unrelated to the bit of code you're playing with. If you put a perror() too late, errno could have been trashed by any other system call that happened along the way.

The code 78 you get makes a lot more sense: 'connection timed out' on AIX 4.3 and 5.1. It's also what perror() would be printing for you.

So the next obvious thing would be trying to connect to the server and port you're giving it by hand, with telnet or something. Maybe it really is not letting you connect on that port.

Can you post your code again? I have no idea what it looks like now.

ctote · January 24, 2011, 9:46am

Here is my updated code - it looks to be working:

int
connection::get_connected (char * hostname, char * service)
{

    int sockfd, portno, n;
    struct sockaddr_in serv_addr;
    struct hostent *server;

    char buffer[256];

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    if (sockfd < 0) 
        error("ERROR opening socket");

    server = gethostbyname(hostname);
    portno = 25;

    if (server == NULL) 
    {
        fprintf(stderr,"ERROR, no such host\n");
        exit(0);
    }

    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;

    bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr, server->h_length);
    serv_addr.sin_port = htons(portno);

    if (connect(sockfd,(struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0) 
        error("ERROR connecting");

#ifdef HA
    // Make this a non-blocking socket
    fcntl (sockfd, F_SETFL, O_NDELAY);
    // make the FD_SET and timeout structures for later operations...
#endif

    FD_ZERO (&fds);
    FD_SET (sockfd, &fds);

    // normal timeout, can be changed by the wait option.
    timeout.tv_sec = 0;
    timeout.tv_usec = 0;

    return 0;

I'm not sure what this section is doing though. Could you explain it to me if you get time?

    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;

    bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr, server->h_length);
    serv_addr.sin_port = htons(portno);

Now I'm getting segfaults, but I haven't narrowed down why - I'm not sure it's completely related to this code change. Thanks for all of your help