Comparing rows and columns

pravsripad · April 10, 2011, 10:17am

Hi,

i've a .csv file with the data as below: -

file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3

I want to remove the duplicate file names considering only the one with the highest version number..

output should be

file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3

Is there any way using awk or sed that i can do this?

thanks
prav

itkamaraj · April 10, 2011, 10:35am

for i in `cut -d, -f1 file_name | sort -u`; do sort -rk2 file_name | grep -m1 $i; done

---------- Post updated at 08:05 PM ---------- Previous update was at 08:05 PM ----------

kamaraj@kamaraj-laptop:~/Desktop/Scripts$ cat interchange 
file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3
kamaraj@kamaraj-laptop:~/Desktop/Scripts$ for i in `cut -d, -f1 interchange | sort -u`; do sort -rk2 interchange | grep -m1 $i; done
file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3

bartus11 · April 10, 2011, 10:39am

Try:

 awk '{x=$2;gsub("\\.","",x);for (i=1;i<=(3-length(x));i++){x=x"0"};if (a[$1]<x){a[$1]=x;b[$1]=$0}}END{for (i in b){print b}}' file.csv

Assuming that highest number of digits in the version number is 3. If it is higher, change red "3" to what is suitable for you.

pravsripad · April 10, 2011, 11:47am

Thanks both of you.

But i could get it to work this way: -

 
#!/bin/sh
REPEATED_FILES=`cut -f1 -d, test_version_file | sort | uniq -d`
for repeat in $REPEATED_FILES
do
grep "$repeat" test_version_file | sort -rk2 | awk -F, '{if($2 > val) {val=$2;print $0;}}'
done

it's a variation of itkamaraj's code...

@itkamaraj :- grep -m option doesn't work on my machine