awk to compare files and count

cmccabe · June 12, 2015, 11:19am

I am trying to compare 2 files and output the results in two files. The lines that are the same output to concordant.txt while the lines that do not match output to discordant.txt. Is there a way to count the lines after specific text (#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT E1) in each file as well? Thank you :).

I have attached the two files to compare as well as the output of the command below. By my math (which is not good), using manual methods it looks like there are 8786 lines that are the same and 100 that are not the same.

awk 'FNR==NR {a[$1]; next} $1 in a' IonXpress_009_run1.txt IonXpress_009_run2.txt > concordant.txt

jim_mcnamara · June 12, 2015, 11:24am

Have you cionsidered using the diff command then working with that output? A lot of your questions fall into the 'one-off' category of code. Sometimes a piecewise apporach allows you to reuse code for a different scenario.

cmccabe · June 12, 2015, 11:37am

I have never used the diff command, but it looks like the < is what is different?

It looks like this will give me the matching (w/o the <) and the non-matching (w/ the <) and then I can use awk to count the differences.

 diff -y IonXpress_009_run1.txt IonXpress_009_run2.txt > concordent.txt

awk 'FNR==NR{c=NR;next}END{print (c==FNR)?"\nAll Good\n":"\nDifference of\t" c-FNR "\trecords\n"}' IonXpress_009_run1.txt IonXpress_009_run2.txt > concordent2.txt

Is there a way the awk can also count the matching lines as well as the differences?