Compare two files and do...

gc_sw · October 26, 2010, 7:01am

hi people;
i have two texts:

file.txt:

cell137 1
cell337 1
cell355 1
cell355 2
cell355 3
cell360 1
cell360 2
cell360 3
...

file-new.txt:

cell137 1
cell355 1
cell355 2
cell355 3
cell360 1
cell360 2
cell370 3
...

the script will compare two files and;

if the same entry exists in both files, write 'zero' to a third file zero.txt.
if the entry in file.txt doesnot exist in file-new.txt, write that entry to a third file: back.txt.
if there is a new entry in file-new.txt, write that entry to a third file: new.txt.

i have tried;

comm /gc_sw/file.txt /gc_sw/file-new.txt

and the output is:

but i couldn't process it!

rbatte1 · October 26, 2010, 7:15am

How about three steps if we're just matching lines:-

grep file.txt file-new.txt > zero.txt           # Will collect matches
grep -vf file.txt file-new.txt >  new.txt     # Will get lines from file-new.txt not in file.txt
grep -vf file-new.txt file.txt >  back.txt     # Will get lines from file.txt not in file-new.txt

Does that help?

gc_sw · October 26, 2010, 8:47am

hmm.. it's quite nice aspect rbatte1! thanks for your assist. it is a better solution thanks.. but since my machine is solaris, i have used;

fgrep -f file.txt file-new.txt > zero.txt
egrep -vf file.txt file-new.txt > new.txt
egrep -vf file-new.txt file.txt > back.txt

Scrutinizer · October 26, 2010, 9:15am

gc_sw:

hi people;
[..]i have tried;
comm /gc_sw/file.txt /gc_sw/file-new.txt
and the output is:
   cell137 1
cell337 1
   cell355 1
   cell355 2
   cell355 3
   cell360 1
   cell360 2
cell360 3
   cell370 3
but i couldn't process it!

Provided the files are sorted:

comm file1 file2 | awk -F"\t" '$1{print $1>"back.txt"}$2{print $2>"new.txt"}$3{print $3>"zero.txt"}'

gc_sw · October 26, 2010, 9:29am

Scrutinizer;
i have tried your code but; be careful that; all of the red values are both $2 and blue values are $3.

Scrutinizer · October 26, 2010, 10:09am

Comm uses TAB(\t) as a default output delimiter, so this will make sure awk gets the right output in the right column. This will work if the originating files do not use TAB as a separator. If that is the case then you need to select a different output separator for comm and use the same for awk. In the example you posted there appeared to be spaces between the fields and using that I got the correct result. Did you test with the option -F"\t" ?

gc_sw · October 27, 2010, 3:37am

i have written output in order to be displayed correctly in the output of "comm" command, values are seperated with TAB. so $2 means the red ones.

anyway, i have tried

-F"\t"

and it works

all of your scripts you replied in this forum are already working perfectly

thanks..