Compare two files and print the two lines with difference

I have two files like this:
#FILE 1
ABCD 4322 26485
JMTJ 5311 97248
XMPJ 4321 58978
#FILE 2
ABCD 4321 26485
JMTJ 5311 97248
XMPJ 4321 68978

What to do: Compare the two files and find those lines that doesn't match. And have a new file like this:
#FILE 3
"from file 1"
ABCD 4322 26485
XMPJ 4321 58978

"from file 2"
ABCD 4321 26485
XMPJ 4321 68978

many thanks!

This should do it.

BEGIN { i=0; j=0 }
FNR == NR { a[i++]=$0; next }
{ 
    if (a[j] != $0) b[j]=1; else b[j]=0; 
    j++; 
}
END {
    print "from file 1"
    for (k=0; k < i; ++k) if (b[k]) print a[k]
    print "\nfrom file 2"
    for (k=0; k < j; ++k) if (b[k]) print a[k]
}

Have a look at the "diff" command. The output format will not be exactly like you want it to be but close enough and the functionality you want is there.

I hope this helps.

bakunin

{
echo "from file 1"; diff file1 file2 | awk '/^</ {sub( /< /, "", $0 ); print }'
echo "from file 2"; diff file1 file2 | awk '/^>/ {sub( /> /, "", $0 ); print }'
} > file3
 
Or just diff file1 file2

Try this:

cat file1 file2 | sort | uniq -u > file3

ABCD 4321 26485
ABCD 4322 26485
XMPJ 4321 58978
XMPJ 4321 68978

Ps: If the output file needs to contain the text "from file ..." etc. please disregard this post.

Hi.

If you have access to Linux, a recent package, moreutils, contains a number of clever utilities, among them is combine:

#!/usr/bin/env bash

# @(#) s1	Demonstrate combine from Linux package moreutils.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) combine
set -o nounset
echo

FILE1=data1
FILE2=data2

echo
echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

echo
echo " Results not in $FILE2:"
combine $FILE1 not $FILE2

echo
echo " Results not in $FILE1:"
combine $FILE2 not $FILE1

exit 0

Producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
combine - ( /usr/bin/combine Jun 28 2008 )


 Data file data1:
ABCD 4322 26485
JMTJ 5311 97248
XMPJ 4321 58978

 Data file data2:
ABCD 4321 26485
JMTJ 5311 97248
XMPJ 4321 68978

 Results not in data2:
ABCD 4322 26485
XMPJ 4321 58978

 Results not in data1:
ABCD 4321 26485
XMPJ 4321 68978

Best wishes ... cheers, drl

In comparing two files, does it need that the two files have the same number of lines?

Hi.

If you are asking about combine, then no, the files need not have the same number of lines -- see example below. Your sample had the same number of lines in both files, so that is what I used in my first response. When you are posting sample data, it is best to ensure that all important characteristics are represented.

Note that not only are the number of lines different, but the order can be different -- the lines red and orange, for example ... cheers, drl

#!/usr/bin/env bash

# @(#) s2	Demonstrate combine from Linux package moreutils.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) combine
set -o nounset
echo

FILE1=data3
FILE2=data4

echo
echo " Data file $FILE1, $(wc -l <$FILE1) lines:"
cat $FILE1

echo
echo " Data file $FILE2, $(wc -l <$FILE2) lines:"
cat $FILE2

echo
echo " Results not in $FILE2:"
combine $FILE1 not $FILE2

echo
echo " Results not in $FILE1:"
combine $FILE2 not $FILE1

exit 0

producing:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
combine - ( /usr/bin/combine Jun 28 2008 )


 Data file data3, 10 lines:
pink
orange
red
yellow
green
blue
indigo
violet
black
mauve

 Data file data4, 7 lines:
silver
red
orange
yellow
green
blue
violet

 Results not in data4:
pink
indigo
black
mauve

 Results not in data3:
silver

Sorry, this is my revision:
File 1:
abc 123 def
ace 246 gik
xyz 357 vnc
File 2:
jgm 342 tpm
ace 246 gik
abc 321 def
xyz 357 vnc

The output file should be:
"Lines that contains mismatch"
abc 123 def
"This is the correct line"
abc 321 def

That's not the same question!

What do you want?

Lines where the 2nd column is not the same? Lines where the lines are nearly the same but not quite? Or in the case of your example, lines which are NOT the same (which is what you asked originally).

Im sorry. I thought your files doesnt need to be with the same number of lines. My fault.

---------- Post updated at 03:16 AM ---------- Previous update was at 12:04 AM ----------

  1. The number of lines on both files is not equal.
  2. The difference occurs on the 2nd or 3rd column of the line.
  3. The comparison is not between line 1 of file 1 and line 1 of file 2. The comparison is between line 1 of file 1 and the line on file 2 that has the same first word as the line in file 1.

I know it's kinda complicated. Im sorry for the misunderstanding. Thanks!

So what should the output be if the files have data as follows ?

$ 
$ cat file1
abc 123 def
ace 246 gik
xyz 357 vnc
pqr 789 stu
$ 
$ cat file2
jgm 342 tpm
ace 246 gik
abc 321 def
xyz 357 vnc
$ 
$ 

(a) What makes "abc 321 def" the correct line ?
(b) Why is "abc 123 def" not the correct line ?

tyler_durden