Hi,
I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file.
File1 - file2 = file3
wc -l file1.txt
58112
wc -l file2.txt
55260
head -5 file1.txt
101214200123
101214700300
101250030067
101214100500
109912312312
head -5 file2.txt
101250030067
101214200123
101214700333
109912312312
101214700300
I can sort the files.
What shall I do after that or any other way
Thanks,
Divya
pamu
October 24, 2013, 10:46am
2
with awk
$ awk 'NR==FNR{A[$1]++;next}{if(! A[$1]){print }else{A[$1]=0}}END{for(i in A){if(A){print i}}}' file1 file2
101214700333
101214100500
1 Like
You may try this also
$ awk 'FNR==NR{A[$1]++;next}{if(!($1 in A))print;else delete A[$1]}END{for (i in A)print i}' file1 file2
101214700333
101214100500
1 Like
CAN YOU PLEASE EXPLAIN WHAT {A[$1]++" EXACTLY " DOES? IT IS CREATING AN ARRAY BUT NOT ASSIGNING ANYTHING. FURTHER INCREMENT SIGN IS NOT UNDERSTOOD BY ME. PLEASE HELP ME UNDERSTAND THIS.
Subbeh
October 28, 2013, 8:53am
5
You could use the comm
command for this:
comm -23 file1.txt file2.txt
The files need to be sorted though
pamu
October 28, 2013, 9:11am
6
Please check below this may clear your doubts...
$ cat file
101250030067
101214200123
101214700333
109912312312
101214700300
101214700333
101214700333
109912312312
$ awk '{A[$1]++}END{for(i in A) print i,A}' file
101214700333 3 # It has 3 occurrence in the file so A=3
101214200123 1 # It has 1 occurrence in the file so A=1
109912312312 2 # It has 2 occurrence in the file so A=2
101214700300 1
101250030067 1
Using the exist clause (x in A)
, it's indeed possible to define an array element without a value:
awk 'FNR==NR {A[$1]; next}
{if ($1 in A) delete A[$1]; else print $1} END {for (i in A) print i}' file1 file2
I need the missing data from file 1 alone...How can we do that..
cat file1
101250030067
101214200123
101214700333
109912312312
cat file2
101250030067
101214200123
101214700333
101214700300
File3 should be 109912312312 alone.
101214700300 is not needed.
ie. Missing data
How about grep ?
grep -v -f file2 file1 > file3
It works for given sample input.
As the 1st file contains 50000 lines, grep is taking too much time.
Can we have a better way.
Try this:
awk 'NR==FNR{a[$1]=$1;next} { if (!a[$1]) { print $1 } } ' file2 file1
output:
109912312312
cat file2
101250030067
101214200123
101214700333
101214700300
cat file1
101250030067
101214200123
101214700333
109912312312
1 Like