Compare two files and output difference, by first field using awk.

charles33 · November 3, 2011, 6:51pm

It seems like a common task, but I haven't been able to find the solution.

vitallog.txt

1310,John,Hancock
13211,Steven,Mills
122,Jane,Doe
138,Thoms,Doe
1500,Micheal,May

vitalinfo.txt

12122,Jane,Thomas
122,Janes,Does
123,Paul,Kite

**OUTPUT**
vitalfiltered.txt

12122,Jane,Thomas
123,Paul,Kite

I tried for two days trying. I tried sort, uniq, grep, etc..., and it seems the answer lies with awk.

Thank You.

ahamed101 · November 3, 2011, 9:51pm

grep -vf vitallog.txt vitalinfo.txt > vitalfiltered.txt

awk -F, 'NR==FNR{_1[$1]++;next}!_1[$1]' vitallog.txt vitalinfo.txt > vitalfiltered.txt

--ahamed

charles33 · November 4, 2011, 12:55am

First off I want to thank you, ahamed101. Thanks!

Ok, I was skeptical at first. I tried the awk solution, on the sample above; it worked. Still skeptical, I tried it on a few large "test" data sets; it worked. I'm still skeptical, before trying on large "production" data sets. So I will keep testing and report back.

I am completely blown away, so far. Which means something has got to be missing; to good to be true!

If this works, I will make ahamed101, my COD WAW nickname