Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi,
I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file.
File1 - file2 = file3

wc -l file1.txt
58112
wc -l file2.txt
55260
 
head -5 file1.txt
101214200123
101214700300
101250030067
101214100500
109912312312
 
head -5 file2.txt
101250030067
101214200123
101214700333
109912312312
101214700300

I can sort the files.
What shall I do after that or any other way

Thanks,
Divya

with awk

 $ awk 'NR==FNR{A[$1]++;next}{if(! A[$1]){print }else{A[$1]=0}}END{for(i in A){if(A){print i}}}' file1 file2

101214700333
101214100500
1 Like

You may try this also

$ awk 'FNR==NR{A[$1]++;next}{if(!($1 in  A))print;else delete A[$1]}END{for (i in A)print i}' file1 file2

101214700333
101214100500
1 Like

CAN YOU PLEASE EXPLAIN WHAT {A[$1]++" EXACTLY " DOES? IT IS CREATING AN ARRAY BUT NOT ASSIGNING ANYTHING. FURTHER INCREMENT SIGN IS NOT UNDERSTOOD BY ME. PLEASE HELP ME UNDERSTAND THIS.

You could use the comm command for this:

comm -23 file1.txt file2.txt

The files need to be sorted though

Please check below this may clear your doubts...

$ cat file

101250030067
101214200123
101214700333
109912312312
101214700300
101214700333
101214700333
109912312312
$ awk '{A[$1]++}END{for(i in A) print i,A}' file

101214700333 3 # It has 3 occurrence in the file so A=3
101214200123 1 # It has 1 occurrence in the file so A=1
109912312312 2 # It has 2 occurrence in the file so A=2
101214700300 1
101250030067 1

Using the exist clause (x in A) , it's indeed possible to define an array element without a value:

awk 'FNR==NR {A[$1]; next}
{if ($1 in  A) delete A[$1]; else print $1} END {for (i in A) print i}' file1 file2

I need the missing data from file 1 alone...How can we do that..

cat file1
 
101250030067
101214200123
101214700333
109912312312


cat file2
101250030067
101214200123
101214700333
101214700300

File3 should be 109912312312 alone.
101214700300 is not needed.
ie. Missing data

How about grep ?

grep -v -f file2 file1 > file3

It works for given sample input.

As the 1st file contains 50000 lines, grep is taking too much time.
Can we have a better way.

Try this:

awk 'NR==FNR{a[$1]=$1;next} { if (!a[$1]) { print $1 } } ' file2 file1 

output:

109912312312
cat file2
101250030067
101214200123
101214700333
101214700300
cat file1
101250030067
101214200123
101214700333
109912312312
1 Like