Common lines from files

Hello guys,

I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns)

Sample input:

file1:

111 222 0.1
333 444 0.5
555 666 0.4

file 2:

111 222 0.7
555 666 0.3
777 888 0.4

sample output:

111 222 0.7
555 666 0.4

This is being done for all the files in the same format in a directory. I have the script without considering the 3rd column condition:

ls DirectoryA | while read FILE; do
  comm -12 DirectoryA/"$FILE" DirectoryB/"$FILE" >> DirectoryC/"$FILE"
done

Please help. Thanks in advance.

Hi

awk 'NR==FNR{a[$1" "$2]=$3;next;}{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2

Guru.

Thanks for the reply. But the script has some problems. It does not discard the lines that are not common. The output needs to be intersection of the lines(i.e. common to both files) and also compare the value of the 3rd column to show the greatest value.

Oops...

awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2

Guru.

1 Like

Fix for Guru's code:

awk 'NR==FNR{a[$1" "$2]=$3;next;}length(a[$1" "$2])>0{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2
1 Like

Thanks to both you guys. Both works fine. If anyone needs here it goes for a directory processing

ls DirectoryA | while read FILE; do
  awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' DirectoryA/"$FILE" DirectoryB/"$FILE" | tr ' ' '\t' > DirectoryC/"$FILE"
done

the tr is because my file was tab separated and somehow in the output that was messed up.