I have two files like this: #FILE 1
ABCD 4322 26485
JMTJ 5311 97248
XMPJ 4321 58978 #FILE 2
ABCD 4321 26485
JMTJ 5311 97248
XMPJ 4321 68978
What to do: Compare the two files and find those lines that doesn't match. And have a new file like this: #FILE 3
"from file 1"
ABCD 4322 26485
XMPJ 4321 58978
BEGIN { i=0; j=0 }
FNR == NR { a[i++]=$0; next }
{
if (a[j] != $0) b[j]=1; else b[j]=0;
j++;
}
END {
print "from file 1"
for (k=0; k < i; ++k) if (b[k]) print a[k]
print "\nfrom file 2"
for (k=0; k < j; ++k) if (b[k]) print a[k]
}
Have a look at the "diff" command. The output format will not be exactly like you want it to be but close enough and the functionality you want is there.
If you are asking about combine, then no, the files need not have the same number of lines -- see example below. Your sample had the same number of lines in both files, so that is what I used in my first response. When you are posting sample data, it is best to ensure that all important characteristics are represented.
Note that not only are the number of lines different, but the order can be different -- the lines red and orange, for example ... cheers, drl
#!/usr/bin/env bash
# @(#) s2 Demonstrate combine from Linux package moreutils.
echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) combine
set -o nounset
echo
FILE1=data3
FILE2=data4
echo
echo " Data file $FILE1, $(wc -l <$FILE1) lines:"
cat $FILE1
echo
echo " Data file $FILE2, $(wc -l <$FILE2) lines:"
cat $FILE2
echo
echo " Results not in $FILE2:"
combine $FILE1 not $FILE2
echo
echo " Results not in $FILE1:"
combine $FILE2 not $FILE1
exit 0
producing:
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0
GNU bash 3.2.39
combine - ( /usr/bin/combine Jun 28 2008 )
Data file data3, 10 lines:
pink
orange
red
yellow
green
blue
indigo
violet
black
mauve
Data file data4, 7 lines:
silver
red
orange
yellow
green
blue
violet
Results not in data4:
pink
indigo
black
mauve
Results not in data3:
silver
Lines where the 2nd column is not the same? Lines where the lines are nearly the same but not quite? Or in the case of your example, lines which are NOT the same (which is what you asked originally).
Im sorry. I thought your files doesnt need to be with the same number of lines. My fault.
---------- Post updated at 03:16 AM ---------- Previous update was at 12:04 AM ----------
The number of lines on both files is not equal.
The difference occurs on the 2nd or 3rd column of the line.
The comparison is not between line 1 of file 1 and line 1 of file 2. The comparison is between line 1 of file 1 and the line on file 2 that has the same first word as the line in file 1.
I know it's kinda complicated. Im sorry for the misunderstanding. Thanks!