Comparing rows and columns

Hi,

i've a .csv file with the data as below: -

file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3

I want to remove the duplicate file names considering only the one with the highest version number..

output should be

file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3

Is there any way using awk or sed that i can do this?

thanks
prav

for i in `cut -d, -f1 file_name | sort -u`; do sort -rk2 file_name | grep -m1 $i; done

---------- Post updated at 08:05 PM ---------- Previous update was at 08:05 PM ----------

kamaraj@kamaraj-laptop:~/Desktop/Scripts$ cat interchange 
file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3
kamaraj@kamaraj-laptop:~/Desktop/Scripts$ for i in `cut -d, -f1 interchange | sort -u`; do sort -rk2 interchange | grep -m1 $i; done
file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3
1 Like

Try:

 awk '{x=$2;gsub("\\.","",x);for (i=1;i<=(3-length(x));i++){x=x"0"};if (a[$1]<x){a[$1]=x;b[$1]=$0}}END{for (i in b){print b}}' file.csv

Assuming that highest number of digits in the version number is 3. If it is higher, change red "3" to what is suitable for you.

Thanks both of you.

But i could get it to work this way: -

 
#!/bin/sh
REPEATED_FILES=`cut -f1 -d, test_version_file | sort | uniq -d`
for repeat in $REPEATED_FILES
do
grep "$repeat" test_version_file | sort -rk2 | awk -F, '{if($2 > val) {val=$2;print $0;}}'
done

it's a variation of itkamaraj's code...

@itkamaraj :- grep -m option doesn't work on my machine