I am trying to compare 2 files and output the results in two files. The lines that are the same output to concordant.txt while the lines that do not match output to discordant.txt. Is there a way to count the lines after specific text (#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT E1) in each file as well? Thank you :).
I have attached the two files to compare as well as the output of the command below. By my math (which is not good), using manual methods it looks like there are 8786 lines that are the same and 100 that are not the same.
awk 'FNR==NR {a[$1]; next} $1 in a' IonXpress_009_run1.txt IonXpress_009_run2.txt > concordant.txt
Have you cionsidered using the diff command then working with that output? A lot of your questions fall into the 'one-off' category of code. Sometimes a piecewise apporach allows you to reuse code for a different scenario.
I have never used the diff
command, but it looks like the < is what is different?
It looks like this will give me the matching (w/o the <) and the non-matching (w/ the <) and then I can use awk
to count the differences.
diff -y IonXpress_009_run1.txt IonXpress_009_run2.txt > concordent.txt
awk 'FNR==NR{c=NR;next}END{print (c==FNR)?"\nAll Good\n":"\nDifference of\t" c-FNR "\trecords\n"}' IonXpress_009_run1.txt IonXpress_009_run2.txt > concordent2.txt
Is there a way the awk
can also count the matching lines as well as the differences?