Byte swap timing

I have noticed the difference in byte swap timing between two Ubuntu systems. The bswap_32 used to work just fine on the old system, but on the new one it lags behind home-grown swap.

My code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <byteswap.h>
#include <sys/time.h>
 union   _U {
          unsigned int          in;
          struct _CH {
                  unsigned char c[4];
          } CH;
};
 int main(int argc, char *argv[])
{
  struct timeval t1, t2;
   union _U u;
  int   n;
  unsigned char tmp;
  
         u.in = (argc == 2) ? atoi(argv[1]) : 0xff;
         gettimeofday(&t1, NULL);
        for(n = 0; n < 100 * 1000000; n++)
        {
                u.in = n;

                 // Version 1
                u.in = bswap_32(u.in);

                 // Version 2 (homegrown)
                //tmp = u.CH.c[0];
                //u.CH.c[0] = u.CH.c[3];
                //u.CH.c[1] = u.CH.c[2];
                //u.CH.c[2] = u.CH.c[1];
                //u.CH.c[3] = tmp;
        }
        gettimeofday(&t2, NULL);
        printf("%lu mls\n", 
                (t2.tv_sec - t1.tv_sec) * 1000000 + (t2.tv_usec - t1.tv_usec));
        return(0);
}

I compile it on both systems simply as

cc -g

+++

Old system : ProLiant ML350 G4 / Intel(R) Xeon(TM) CPU 3.20GHz (about 8 years old)
Linux OLD 2.6.24-32-server #1 SMP Thu Jul 12 15:21:48 UTC 2012 i686 GNU/Linux
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

Version 1 avg 500,000 mls, version 2 avg 1,250,000 mls

+++

New system : ProLiant DL360p Gen8 / Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz 4 cores (about 3 years old)
Linux NEW 3.13.0-55-generic #94-Ubuntu SMP Thu Jun 18 00:27:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2)

Version 1 avg 480,000 mls, version 2 avg 390,000 mls

+++

Why on the new system a standard bswap_32 is quite a bit slower than manually shuffling the bytes?

Thanks in advance.

Shouldn't it be

                 // Version 2 (homegrown)
                tmp = u.CH.c[0];
                u.CH.c[0] = u.CH.c[3];
                u.CH.c[3] = tmp;
                tmp = u.CH.c[1];
                u.CH.c[1] = u.CH.c[2];
                u.CH.c[2] = tmp;

?

Compiler differences likely account for a large part of it, or perhaps library differences, there isn't really a significant difference on your new machine. Why it's happening depends on the assembly the compiler is generating, but I suspect bswap_32 hasn't gotten worse, as much as the compiler's gotten better.

(to MadeInGermany) Yes, it should, my bad (copy/paste difficulties in IE). But my question is - why bswap takes longer time? any ideas?

(to Corona) newer compiler is better, I assume, but the timing of bswap suffered - how?

That bswap is slower then the macro does not mean bswap got worse, for all we know, the macro got optimized better, to a degree it exceeded the external bswap. Too many circumstances were changed to deduce anything from those numbers.

If you insist on a completely wild guess, I would wonder if bswap() in modern gcc is inline assembly, and the compiler is optimizing the macro into a bswap instruction. The compiler can do a better job optimizing instructions it makes than instructions you force it to use. But to get a definitive answer on what's going on you'll have to take a look at the assembly.

1 Like