How to find union of two files

stevefox · December 6, 2005, 3:45am

Is there a command in unix to find the union of two files and removing the union from one of the files?

e.g. I have two files input1.txt and input2.txt with the contents below:

$ more input1.txt
4
2
3
2

$ more input2.txt
5
4
4
8
2

I want to find the union of the two and remove the union from input1.txt to output the below:
3

Any help will be appreciated.

futurelet · December 6, 2005, 4:39am

ruby -e 'puts IO.read($*[0]).to_a - IO.read($*[1]).to_a' dat1 dat2

stevefox · December 6, 2005, 4:48am

Thanks futurelet.
Is there a way to do this without using ruby and only using standard HP Unix commands?

Livio · December 6, 2005, 8:30am

awk '{print $0}' input1.txt input2.txt |sort -u

stevefox · December 6, 2005, 9:25pm

Thanks Livio
However your code did not give input1.txt minus the union of input1.txt and input2.txt which is 3.

I worked out that I can do this by the code below but is there a simpler way to do this using standard Unix commands? (possibly in one line?)

cat input1.txt input2.txt | sort | uniq -d > union
cat input1.txt union | sort | uniq -c > union2
sed -n '/1 /p' union2 > union3
sed -e 's/   1 //g' union3

stevefox · December 7, 2005, 4:00am

I found out that it can be done by

sort -u input1.txt > temp1.txt
sort -u input2.txt > temp2.txt
comm -23 temp1.txt temp2.txt

pixelbeat · December 7, 2005, 5:16am

I think what your saying is output the contents of
file1 that are not in file2...

export LANG=C #for speed
sort input1.txt input2.txt input2.txt | uniq -u

stevefox · December 7, 2005, 8:32pm

Thanks pixelbeat!
That's what I was looking for. The "-u" option for uniq.