Compare two files

amir07 · October 8, 2008, 11:56am

I need to compare two files:
Basically I have an input file fileA.
which need to be compare with fileB located in /etc/lc/mbd directroy

Both file format is like:

abc01def:10.80.11.123

The input file format is:

abc01mns:10.80.11.1
dbc02mns:10.80.11.2
fbc01mns:10.80.11.3
rbc01mns:10.80.11.4
tbc01mps:10.80.11.5
abt05mns:10.80.11.6
zbc11mys:10.80.11.7
ttc01mns:10.80.11.8
hbc05mns:10.80.11.9
qbc01mns:10.80.11.10

So after comparison the script will tell me:
what has been dupicate and what not duplicate.

Thanks

joeyg · October 8, 2008, 12:01pm

My preference is the comm command. From the manpages --

OPTIONS
     The following options are supported:

     -1       Suppresses the output column  of  lines  unique  to
              file1.

     -2       Suppresses the output column  of  lines  unique  to
              file2.

     -3       Suppresses the output column of lines duplicated in
              file1 and file2.

If you provide samples of both files, an example command could be created.

amir07 · October 8, 2008, 3:23pm

Thanks.,

This approach works, what do you think:

#!/bin/ksh
while read myline
do
cnt=0
while read line
do
if [[ "$myline" = "$line" ]]
then
((cnt+=1))
break
fi
done < file1
if [[ $cnt -eq 0 ]]
then
echo "$myline" >> output.file
fi
done < file2

amir07 · October 8, 2008, 3:30pm

But the problem here is suppose if i have one additional line in any file, it does not compare. It compares only with exact line numbers in each file.

matrixmadhan · October 8, 2008, 3:37pm

you can try something like,

[ not tested ]

awk 'BEGIN{ while ( getline < "file_1" ) { arr[$0]++ } }{ if ( $0 in arr ) { printf "%s is duplicate\n", $0 } }' file_2

treesloth · October 8, 2008, 3:41pm

The diff command seems ideal for this. In particular:

diff -y file1 file2

This will give a side-by-side comparison. The man page lists more options than I've ever dreamed of using, but -y seems to answer your needs.

Franklin52 · October 9, 2008, 2:35am

Try this, the result is stored in the files dup_file and no_dup_file:

awk 'NR==FNR{a[$0]=$0;next}
$0 in a {print $0 > "dup_file";next}
{print $0 > "no_dup_file"}
' fileA fileB

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

amir07 · October 10, 2008, 12:04pm

Thanks, sdiff works, but I tried the above code but getting syntext error in line 2:

$ ./Comp2Files.awk
awk: syntax error near line 2
awk: bailing out near line 2

Franklin52 · October 11, 2008, 8:04am

Just type the command on the prompt or use a script like this:

#!/bin/sh

awk 'NR==FNR{a[$0]=$0;next}
$0 in a {print $0 > "dup_file";next}
{print $0 > "no_dup_file"}
' fileA fileB

Replace the filenames with your filenames and if you get errors use nawk, gawk or /usr/xpg4/bin/awk on Solaris.

Regards