Common lines from files

jaysean · June 28, 2010, 11:00am

Hello guys,

I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns)

Sample input:

file1:

111 222 0.1
333 444 0.5
555 666 0.4

file 2:

111 222 0.7
555 666 0.3
777 888 0.4

sample output:

111 222 0.7
555 666 0.4

This is being done for all the files in the same format in a directory. I have the script without considering the 3rd column condition:

ls DirectoryA | while read FILE; do
  comm -12 DirectoryA/"$FILE" DirectoryB/"$FILE" >> DirectoryC/"$FILE"
done

Please help. Thanks in advance.

guruprasadpr · June 28, 2010, 11:13am

Hi

awk 'NR==FNR{a[$1" "$2]=$3;next;}{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2

Guru.

jaysean · June 29, 2010, 2:20am

Thanks for the reply. But the script has some problems. It does not discard the lines that are not common. The output needs to be intersection of the lines(i.e. common to both files) and also compare the value of the 3rd column to show the greatest value.

guruprasadpr · June 29, 2010, 2:35am

Oops...

awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2

Guru.

bartus11 · June 29, 2010, 2:35am

Fix for Guru's code:

awk 'NR==FNR{a[$1" "$2]=$3;next;}length(a[$1" "$2])>0{ if (a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' file1 file2

jaysean · June 29, 2010, 3:19am

Thanks to both you guys. Both works fine. If anyone needs here it goes for a directory processing

ls DirectoryA | while read FILE; do
  awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}' DirectoryA/"$FILE" DirectoryB/"$FILE" | tr ' ' '\t' > DirectoryC/"$FILE"
done

the tr is because my file was tab separated and somehow in the output that was messed up.