Hello guys,
I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns)
Sample input:
file1:
111 222 0.1
333 444 0.5
555 666 0.4
file 2:
111 222 0.7
555 666 0.3
777 888 0.4
sample output:
111 222 0.7
555 666 0.4
This is being done for all the files in the same format in a directory. I have the script without considering the 3rd column condition:
ls DirectoryA | while read FILE; do
comm -12 DirectoryA/"$FILE" DirectoryB/"$FILE" >> DirectoryC/"$FILE"
done
Please help. Thanks in advance.
Hi
awk 'NR==FNR{a[$1" "$2]=$3;next;}{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2
Guru.
Thanks for the reply. But the script has some problems. It does not discard the lines that are not common. The output needs to be intersection of the lines(i.e. common to both files) and also compare the value of the 3rd column to show the greatest value.
Oops...
awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2
Guru.
1 Like
Fix for Guru's code:
awk 'NR==FNR{a[$1" "$2]=$3;next;}length(a[$1" "$2])>0{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2
1 Like
Thanks to both you guys. Both works fine. If anyone needs here it goes for a directory processing
ls DirectoryA | while read FILE; do
awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' DirectoryA/"$FILE" DirectoryB/"$FILE" | tr ' ' '\t' > DirectoryC/"$FILE"
done
the tr is because my file was tab separated and somehow in the output that was messed up.