Filtering out duplicates with the highest version number

Hi,

I have a huge text file with filenames which which looks like the following ie uniquenumber_version_filename:

e.g.

1234_1_xxxx
1234_2_vfvfdbb
343333_1_vfvfdvd
2222222_1_ggggg
55555_1_xxxxxx
55555_2_vrbgbgg
55555_3_grgrbr

What I need to do is examine the file, look for duplicate uniquenumbers and then filter out the uniquenumber with the highest version, so for example in the above it would be the following:

1234_2_vfvfdbb
55555_3_grgrbr

Is there a scripted method by which I can do this?

Thanks in advance
Mantis

awk -F_ ' { a=$1; b=$2; getline; c=$1; d=$2; if(a==c) { if(b<d) $0=$0; } else { getline; $0=""; e=$1; f=$2; if(d<f) $0=""; } }  { print; } '  infile

That's an amazing solution sir, but it looks abit complicated.

Also will I will be able to apply it to a huge file full of uniquenumber_version_filename with thousands of rows?

Thanks

Another awk solution:

awk -F_ '$2>0+V[$1]{V[$1]=$2;F[$1]=$0} END{for(k in V) print F[k]}' OFS=_ infile

This should work for quite large files, however the output will be unsorted, you didn't specify if the file order was important.

If the field "uniquenumber" is clustered together along with the "version" in order (as shown in your example), you can use:

awk -F_ '{ if ( ($1 != u) && (v > 1) ) { print l } u=$1; v=$2; l=$0 }' yourfile