Compare two files and do...

hi people;
i have two texts:

file.txt:

cell137 1
cell337 1
cell355 1
cell355 2
cell355 3
cell360 1
cell360 2
cell360 3
...

file-new.txt:

cell137 1
cell355 1
cell355 2
cell355 3
cell360 1
cell360 2
cell370 3
...

the script will compare two files and;

  • if the same entry exists in both files, write 'zero' to a third file zero.txt.
  • if the entry in file.txt doesnot exist in file-new.txt, write that entry to a third file: back.txt.
  • if there is a new entry in file-new.txt, write that entry to a third file: new.txt.

i have tried;

comm /gc_sw/file.txt /gc_sw/file-new.txt

and the output is:

                cell137 1
cell337 1
                cell355 1
                cell355 2
                cell355 3
                cell360 1
                cell360 2
cell360 3
        cell370 3

but i couldn't process it! :frowning:

How about three steps if we're just matching lines:-

grep file.txt file-new.txt > zero.txt           # Will collect matches
grep -vf file.txt file-new.txt >  new.txt     # Will get lines from file-new.txt not in file.txt
grep -vf file-new.txt file.txt >  back.txt     # Will get lines from file.txt not in file-new.txt

Does that help?

hmm.. it's quite nice aspect rbatte1! thanks for your assist. it is a better solution :slight_smile: thanks.. but since my machine is solaris, i have used;

fgrep -f file.txt file-new.txt > zero.txt
egrep -vf file.txt file-new.txt > new.txt
egrep -vf file-new.txt file.txt > back.txt

:slight_smile:

Provided the files are sorted:

comm file1 file2 | awk -F"\t" '$1{print $1>"back.txt"}$2{print $2>"new.txt"}$3{print $3>"zero.txt"}'
1 Like

Scrutinizer;
i have tried your code but; be careful that; all of the red values are both $2 and blue values are $3.

                cell137 1
cell337 1
                cell355 1
                cell355 2
                cell355 3
                cell360 1
                cell360 2
cell360 3
        cell370 3

Comm uses TAB(\t) as a default output delimiter, so this will make sure awk gets the right output in the right column. This will work if the originating files do not use TAB as a separator. If that is the case then you need to select a different output separator for comm and use the same for awk. In the example you posted there appeared to be spaces between the fields and using that I got the correct result. Did you test with the option -F"\t" ?

i have written output in order to be displayed correctly :slight_smile: in the output of "comm" command, values are seperated with TAB. so $2 means the red ones.

anyway, i have tried

-F"\t"

and it works :slight_smile:

all of your scripts you replied in this forum are already working perfectly :slight_smile:

thanks..