Comparing two files

Hi

I have two files -- abc and xyz
contents of abc:

K","lr1207i04","1207","B  ","MTP2",60,60,0,0,960,960,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","ls1101i00","1301","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","","1301","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1303","A  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1304","A  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1304","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1305","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1305","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","ls1101i00","1306","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","ls1101n04","1306","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

contents of xyz

K","lr1207i04","1207","B  ","MTP2",60,60,0,0,960,960,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","ls1101i00","1301","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"","","1301","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1303","A  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1304","A  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1304","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1305","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","","1305","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","ls1101i00","1306","A  ","",0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
"K","ls1101n04","1306","B  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Hello,

Please use code tags for commands and code. Also let us know your desired Output please.

Thanks,
R. Singh

is there any "easy" way I can see if these two files are the same ??
If different, i need to print the difference
I have a perl script for that purpose. It seems to output erractically.
I want to do the same in shell script now.
Not sure where to begin
any help is welcome

~Thanks

Have you tried 'diff' command ? How do you like to print if they differ ?

these files are .csv format
I need to create shell script only

seeing the diff is secondary.
I wish to see if the two files are the same first up

~thanks

diff is what you should start with. Do a diff and check the exit status.

--ahamed

man cmp :

that helped thanks

---------- Post updated at 12:25 PM ---------- Previous update was at 12:20 PM ----------

However ,
what if I want to retrieve each of the lines and put all the values in them in an associative array ?
eg-

"K","","1303","A  ","",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

K
1303
A
0
0
0 .. etc

---------- Post updated at 12:26 PM ---------- Previous update was at 12:25 PM ----------

how to write a snippet for that ?

~thanks

Do you want to print the exact difference? or the lines that are different?

the difference as well

The below will tell you if the files are different or not

cmp -s file1.txt file2.txt; echo $? | xargs -I var bash -c 'if [ var -gt 0 ];then echo "Files are different"; else echo "Files are the same";fi'

Computing the difference using only shell needs a little work. I think in going trough the files and compare lines from one file with the corresponding lines from the other.

And where you encounter differences to echo with different color or smth...but as i said i think this needs a little work...

---------- Post updated at 09:39 AM ---------- Previous update was at 07:11 AM ----------

Hi,
I came up with the following script :

#Date : 14.08.2013
#Author : Ionut Capitanu
#Purpose : Verify if 2 CSV files are different and show line by line differences
#Usage : ./cmp_files.sh <file1> <file2>

#Verify positional params
if [ $# -lt 2 ]; then
        echo "Not enough parameters, 2 files needed for comparison"
        echo "Usage : ./cmp_files.sh <file1> <file2>"
        exit 1
fi

#Get file names
FILE1=$1
FILE2=$2

#Verify that the files exist
if [[ ! -e $FILE1 ]]; then
        echo "File `pwd`/$FILE1 does not exist"
        exit 1
fi

if [[ ! -e $FILE2 ]]; then
        echo "File `pwd`/$FILE2 does not exist"
        exit 1
fi


#Colors
red="\e[0;31m"
NC="\e[0m"

#Verify if files are different
DIFF=`cmp -s $FILE1 $FILE2; echo $?`

if [[ "$DIFF" = 0 ]]; then
                echo "Files are not different. No need for comparison"
                exit 1
else
          #Loop through files and compare fields
          MAXLINE=0
          NRLINES_F1=`wc -l $FILE1 | cut -d" " -f1`
          NRLINES_F2=`wc -l $FILE2 | cut -d" " -f1`

          if [ $NRLINES_F1 -gt $NRLINES_F2 ]; then
                  MAXLINE=$NRLINES_F1
              else
                  MAXLINE=$NRLINES_F2
          fi

              for (( j=1;j<=MAXLINE;j++ ));  do
              if [[ `awk -F, -v j="$j" 'FNR == j {print $0}' $FILE1` != `awk -F, -v j="$j" 'FNR == j {print $0}' $FILE2` ]];then
                   #Find maximum number of fields between lines j of both files
                    MAX=0
                    if [[ `awk -F, -v j="$j" 'FNR == j {print NF}' $FILE1` -lt `awk -F, -v j="$j" 'FNR == j {print NF}' $FILE2` ]]; then
                            MAX=`awk -F, -v j="$j" 'FNR == j {print NF}' $FILE2`
                    else
                            MAX=`awk -F, -v j="$j" 'FNR == j {print NF}' $FILE1`
                    fi

                    #Echo line j from FILE1
                    echo `pwd`"/$FILE1,Line$j:" #>> result.file
                    awk -v j="$j" 'FNR == j {print $0}' $FILE1 #>> result.file

                    #Echo header for line j from FILE2
                    echo `pwd`"/$FILE2,Line$j:" #>> result.file

                    #Compute differences
                    for (( i=1;i<=MAX;i++ ));  do

                    #Echo line from FILE2 with different fields in red
               if [ "`awk -F, -v i="$i" -v j="$j" 'FNR == j {print $i}' $FILE1`" == "`awk -F, -v i="$i" -v j="$j" 'FNR == j {print $i}' $FILE2`" ]
                   then
                             echo -n `awk -F, -v i="$i" -v j="$j" 'FNR == j {print $i}' $FILE1`"," #>>result.file
                     else
                             echo -n -e "${red}`awk -F, -v i="$i" -v j="$j" 'FNR == j {print $i}' $FILE2`","${NC}" #>> result.file
                     fi
                     done
                  echo -e "\n"
              fi
               done
fi

#END

Hope this helps you. Feel free to modify it, modularize it maybe with functions or something, bring in some more logic to it, etc.

Test files :

file1.txt:
1,2,3,4,5,6,7,8,9,"G",F, T, D
1
4,5,5,
file2.txt:
T,2,3,4,0,6,7,8,ABC,"G",z, T, D,10,12
46,7,8,9,0,
4,5,5
Output:
/home/mind/tmp/file1.txt,Line1:
1,2,3,4,5,6,7,8,9,"G",F, T, D
/home/mind/tmp/file2.txt,Line1:
T,2,3,4,0,6,7,8,ABC,"G",z,T,D,10,12,

/home/mind/tmp/file1.txt,Line2:
1
/home/mind/tmp/file2.txt,Line2:
46,7,8,9,0,,

/home/mind/tmp/file1.txt,Line3:
4,5,5,
/home/mind/tmp/file2.txt,Line3:
4,5,5,,

/home/mind/tmp/file1.txt,Line4:
/home/mind/tmp/file2.txt,Line4:

The differences will be outlined in red.

EDIT : I modified it a little bit to output only different lines