Md5sum is running very slowly

Hi,

I am trying to get the hash values of md5 of a string. I am on Redhat Linux. using the 25-27 field in the file I need to generate the md5 and append it at the end of the record as a new field.

I have tried the below code but its painfully slow. can you please suggest any alternatives or help me tune it?

awk -F"\�" '{ print $25" " $26 " " $27 }' /var/IBM/CMA/LandingArea/Analysis/Add.txt | while read x ; do echo $x |md5sum ; done

Do you have a compiler installed?

which compiler are you referring to? I do have c++ installed

$ cat md5line.c

#include <openssl/md5.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main()
{
        int n;
        MD5_CTX c;
        unsigned char buf[1024], out[MD5_DIGEST_LENGTH];

        while(fgets(buf, 1024, stdin))
        {
                MD5_Init(&c);
                MD5_Update(&c, buf, strlen(buf)-1);
                MD5_Final(out, &c);

                for(n=0; n<MD5_DIGEST_LENGTH; n++)
                        printf("%02x", out[n]);
                fputs("\n",stdout);
        }

        return(0);
}

$ gcc md5line.c -o md5line -lssl # May need "openssl-dev" or something like that

$ printf "%s\n" a b c d e

a
b
c
d
e

$ echo "a" | md5sum # This includes the \n!
60b725f10c9c85c70d97880dfe8191b3  -

$ printf "a" | md5sum # Does not add \n
0cc175b9c0f1b6a831c399e269772661  -

$ printf "%s\n" a b c d e | ./md5line # My code also ignores \n
0cc175b9c0f1b6a831c399e269772661
92eb5ffee6ae2fec3ad71c777531578f
4a8a08f09d37b73795649038408b5f33
8277e0910d750195b448797616e091ad
e1671797c52e15f763380b45e841ec32

$

Thanks for the code. I did try to compile but I get the error message as shown below. I have no c\c++ skills to be able to debug it. can you please help me?

g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small md5tester.cpp  -o libmd5teser.so
md5tester.cpp:1:25: error: openssl/md5.h: No such file or directory
md5tester.cpp: In function �int main()':
md5tester.cpp:9: error: �MD5_CTX' was not declared in this scope
md5tester.cpp:9: error: expected `;' before �c'
md5tester.cpp:10: error: �MD5_DIGEST_LENGTH' was not declared in this scope
md5tester.cpp:12: error: invalid conversion from �unsigned char*' to �char*'
md5tester.cpp:12: error:   initializing argument 1 of �char* fgets(char*, int, FILE*)'
md5tester.cpp:14: error: �c' was not declared in this scope
md5tester.cpp:14: error: �MD5_Init' was not declared in this scope
md5tester.cpp:15: error: invalid conversion from �unsigned char*' to �const char*'
md5tester.cpp:15: error:   initializing argument 1 of �size_t strlen(const char*)'
md5tester.cpp:15: error: �MD5_Update' was not declared in this scope
md5tester.cpp:16: error: �out' was not declared in this scope
md5tester.cpp:16: error: �MD5_Final' was not declared in this scope

How about using perl

#!/usr/bin/perl -w
use Digest::MD5 qw(md5 md5_hex md5_base64);

open my $DAT, $ARGV[0] or die "Could not open $ARGV[0]: $!";

while (my $line = <$DAT>) {
  chomp $line;
  my @fld = split /�/, $line;
  print $line . "�" . md5_hex($fld[24]." ".$fld[25]." ".$fld[26] . "\n") . "\n";
}

close $DAT;

Save as addms5sum.pl and call it like this:

$ ./addmd5sum.pl /var/IBM/CMA/LandingArea/Analysis/Add.txt > /var/IBM/CMA/LandingArea/Analysis/Add_fixed.txt

Edit: I've included the new line in the md5sum value, remove . "\n" from the md5_hex() call above if you don't need it.

Now I am getting this error

Use of uninitialized value in concatenation (.) or string at ./addmd5sum.pl line 9, <$DAT> line 10398.

This is because the separator character you listed doesn't match so it's not finding 26 fields. I might need on od -c dump of a line to get the proper value of

My test file below works OK:

������������������������L1F1�L1F2�L1F3
������������������������L2F1�L2F2�L2F3
������������������������L3F1�L3F2�L3F3

Edit: We can avoid perl errors and simply print invalid lines like this:

  if (scalar(@fld)>26) {
      print $line . "�" . md5_hex($fld[24]." ".$fld[25]." ".$fld[26] . "\n") . "\n";
  } else {
      print $line . "\n";
  }

but we've really got to sort out what that separator character is to get the results you want.

Thanks. I noticed that the string is being generated and appended to the new file.

Here is the octal dump

0000000   X   X   X 254   X   X   X   |   1   0   0   0   0   4 254   1
0000020   6       X   X   X   X   X   X       X   X 254 254 254   X   X
0000040   X   X   X   X       X   X   X   X 254   X   X 254   2   4   X
0000060   8 254   5   1   7   8   9   1   3   1 254   X   X   X 254 254
0000100 254 254 254 254   1   6 254 254   X   X   X   X   X   x       D
0000120   r 254 254   X   X   X   x   X   X       X   X   X   X 254   X
0000140   S   W 254   1   4   2   8 254   A   U   S   T   R   A   L   I
0000160   A 254   5   X   X   X   9   1   3   1 254 254   1   6 254   A
0000200   X   X   X   X   X       D   r 254   R   9   1   3   0   0   0
0000220   0   0   0   6   0   :   A   U   S  \r  \n
0000233

OK looks like your separator character is asc(172), that is 254 in octal.

Also looks like a DOS formatted file - this should get the job done. Note I'm putting the \r characters back to retain the DOS format on output.

#!/usr/bin/perl -w
use Digest::MD5 qw(md5 md5_hex md5_base64);

my $sep = chr(172);

open my $DAT, $ARGV[0] or die "Could not open $ARGV[0]: $!";

while (my $line = <$DAT>) {
  $line =~ s/\r?\n$//;
  my @fld = split $sep, $line;
  if (scalar(@fld)>26) {
      print $line . $sep . md5_hex($fld[24]." ".$fld[25]." ".$fld[26] . "\n") . "\r\n";
  } else {
      print $line . "\r\n";
  }
}

close $DAT;
1 Like

Well, this is not processing anything at all. It's giving the error message as below

Can't locate object method "e" via package "Digest::MD5" (perhaps you forgot to load "Digest::MD5"?) at ./addmd5sum.pl line 1.

Not getting that error with code from post #10 please check you transposed correctly.

Sorry it was a copy paste issue. It works perfectly :slight_smile: Thanks for all your help :smiley:

perl -MDigest::MD5=md5_hex -aF\\254 -lpe '$_=md5_hex(join(" ", @F[24..26]));'

Regards,
Alister

Edit: This is not equivalent to Chubler's suggestion and it only prints the hashes. Oh well. At least I spent a little time kicking some rust off my very puny perl muscles.

1 Like

Nice alister - my perl skills are weak!

Can that beauty be enhanced to append the md5sum to the input file as a new field?

As I said, you will probably need to install the openssh-dev library or whatever your distribution happens to call it. That's why md5.h is missing, because it's not installed.

Do you mind sharing just the header file?

I doubt you will get much out of the .h file without the library (.so files) on your system, as the .h file just defines the functions exported from the library.

Anything compiled will not link without the library.

I agree, but I just wanted to give it a shot with the compilers I have :wink:

I would be astonished if Redhat didn't come with openssl...

But he doesn't need "a header file" -- he needs openssl's header files, a bunch of them, and the right ones for his version of openssl, correctly installed and configured in the correct places.