What I really need is a script that compares 2 (.csv) text files line by line with a single entries on each line and then outputs NON-duplicate lines to a third (.csv) text file, the problem is the lines may be exactly the same, but in different order in the 2 text files, so
sourcefile1 contains
bob
jane
sally
sourcefile2 contains
sally
bob
output to > file3 containing
jane
I've tried using:
grep -vxf sourcefile1 sourcefile2 > file3
but I get no output, because the lines are in different order? if I do:
comm -13 sourcefile1 sourcefile2 > file3
I get error: "file 1 is not in sorted order", but sorting it doesn't seem to help. I was thinking about writing a loop that said something like:
cat sourcefile1 | while read LINE
do
cat sourcefile2 | while read LINE2
do
if [ "$LINE2" = "$LINE" ]
then
exit
else
echo $LINE > file3
fi
done
done
but I don't know if my logic is right (I'm sure my syntax is wrong), and it seems like an inefficient way to do it, are there better/more elegant ways to do this?
If there's a possibility of a name containing a regular expression metacharacter (such as a dot following an initial, for example), to strictly match an entire line with grep, you'll want to use -xF along with whatever other options the logic of the solution demands.
using a single grep wouldn't see changes to the other file. The second version looks more elegant:
cat file1 file2 | sort | uniq -u > file3
though I'm not sure which method would use less cpu/mem? I'm sorting a few million records with this so resource usage may be an issue. Actually I have to repeat this process for many text files in a directory, so I may want to automate piping the ls of the dir and to this process, so at the end I get a dump of only unique records from all the files.