Hi,
i've a .csv file with the data as below: -
file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3
I want to remove the duplicate file names considering only the one with the highest version number..
output should be
file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3
Is there any way using awk or sed that i can do this?
thanks
prav
for i in `cut -d, -f1 file_name | sort -u`; do sort -rk2 file_name | grep -m1 $i; done
---------- Post updated at 08:05 PM ---------- Previous update was at 08:05 PM ----------
kamaraj@kamaraj-laptop:~/Desktop/Scripts$ cat interchange
file1.h, 2.0
file2.c, 3.1
file1.h, 2.5
file3.c, 3.3.3
file1.h, 1.2.3
kamaraj@kamaraj-laptop:~/Desktop/Scripts$ for i in `cut -d, -f1 interchange | sort -u`; do sort -rk2 interchange | grep -m1 $i; done
file1.h, 2.5
file2.c, 3.1
file3.c, 3.3.3
1 Like
Try:
awk '{x=$2;gsub("\\.","",x);for (i=1;i<=(3-length(x));i++){x=x"0"};if (a[$1]<x){a[$1]=x;b[$1]=$0}}END{for (i in b){print b}}' file.csv
Assuming that highest number of digits in the version number is 3. If it is higher, change red "3" to what is suitable for you.
Thanks both of you.
But i could get it to work this way: -
#!/bin/sh
REPEATED_FILES=`cut -f1 -d, test_version_file | sort | uniq -d`
for repeat in $REPEATED_FILES
do
grep "$repeat" test_version_file | sort -rk2 | awk -F, '{if($2 > val) {val=$2;print $0;}}'
done
it's a variation of itkamaraj's code...
@itkamaraj :- grep -m option doesn't work on my machine