comparing 2 text files to get unique values??

smarty86 · December 16, 2008, 12:40pm

Hi all,

I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable).

For eg:

file1.txt
1
2
3
4

file2.txt
2
3

So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out.

jim_mcnamara · December 16, 2008, 12:44pm

awk ' FILENAME=="file1" { if ($0 in arr) {continue} else {print $0}}
        FILENAME=="file2" {arr[$0]++ }
       ' file2 file1

vgersh99 · December 16, 2008, 12:45pm

( cat file1.txt file2.txt ) | sort | uniq -c | awk '$1==1 {print $2}'

smarty86 · December 16, 2008, 12:54pm

@vgersh99 @Jim

Thanks Frens, ur code works awesome...

It gave me the proper o/p. hey if u don mind can u pls explain me the things in that?

anyways thanks for ur help...

vgersh99 · December 16, 2008, 1:01pm

( cat file1.txt file2.txt ) | sort | uniq -c | nawk -v OFS='\t' '$1==1 {$1=$1;print}' | cut -f2-

vgersh99 · December 16, 2008, 1:04pm

Jim,
if the content of file1 and file2 is reversed, this will not work.

here's another alternative (assuming a given record/line appears only once in a given file):

 nawk '{ if ($0 in a) delete a[$0]; else a[$0]} END { for (i in a) print i}' file1 file2

smarty86 · December 16, 2008, 1:05pm

@vgersh99

after ur changes the first code is working well man... thank u.. pls explain me about awk statement in that if u dont mind...

vgersh99 · December 16, 2008, 1:26pm

nawk -v OFS='\t' '$1==1 {$1=$1;print}'

" -v OFS='\t' " - set the OutputFieldSeparator to '\t' (tab)
" $1 == 1" - if the value of field 1 ($1) is 1 (the number of occurrences in the combined file), then do {...}

" {$1=$1; print} " - force the reevaluation of the current record/line - forcing the '\t' delimited fields

" | cut -f2- " - given a default field delimiter of '\t', 'cut' everything starting at field 2.