Comparing the files

i150371485 · August 7, 2012, 3:16am

Hi Friends,

I have file1.txt

29123973�2012-05-29�35310124�00000000000469744762�00010�20�390��F
29123974�2012-05-29�35310125�00000000000469744770�00010�20�390��F
29123975�2012-05-29�35310126�00000000000469744804�00010�20�390��F
29123976�2012-05-29�35310127�00000000000469744820�00010�20�390��F
29123977�2012-05-29�35310128�00000000000469744846�00010�20�390��F
29123978�2012-05-29�35310129�00000000000469744895�00010�20�390��F
29123979�2012-05-29�35310130�00000000000469744903�00010�20�401��F
29123980�2012-05-29�35310131�00000000000469744911�00010�20�390��F
29123981�2012-05-29�35310132�00000000000469744929�00010�20�390��F
29123982�2012-05-29�35310133�00000000000469744952�00010�20�390��F

file2.txt

29123973�2012-05-29�35310124�00000000000469744762�00010�20�390��F
29123974�2012-05-29�35310125�00000000000469744770�00010�20�390��F
2923975�2012-05-29�35310126�00000000000469744804�00010�20�390��F
29123976�2012-05-29�35310127�00000000000469744820�00010�20�390��F
29123977�2012-05-29�35310128�00000000000469744846�00010�20�390��F
29123978�2012-05-29�35310129�00000000000469744895�00010�20�390��F
29123979�2012-05-29�35310130�00000000000469744903�00010�20�401��F
29123980�2012-05-29�35310131�00000000000469744911�00010�20�390��F
29123981�2012-05-29�35310132�00000000000469744929�00010�20�390��F
29123982�2012-05-29�35310133�00000000000469744952�00010�20�390��F

I tried using the diff and comm but not getting the expected output..

I want where exactly the miss match occurs. probably the field.

Sourcevalue|Targetvalue|Linenumber|field
29123975|2923975|3|1

Please help.

i tried with diff but output which i got is

#!/bin/bash
 
>extra.txt
>mismatch.txt
while read sLine; do
    OFS="$IFS"
    IFS="�"
    sTab= ${sLine} ;
    tLine="${egrep "^"${sTab[0]} file2.txt}"
    if [ -z "$tLine" ]; then echo "$sLine" >>extra.txt; IFS="$OFS"; continue; fi
    tTab= ${tLine} ;
    for (( i = 1 ; i < ${#sTab[@]} ; i++ )); do
        [ "${sTab[$i]}" = "${tTab[$i]}" ] || echo "${sTab[0]}|$i|${sTab[$i]}|${tTab[$i]}" >>mismatch.txt
    done
    IFS="$OFS"
done <file1.txt
echo "Number of Extra records in Source file : $(cat extra.txt|wc -l)"
cat extra.txt
echo "Number of mismatches : $(cat mismatch.txt|wc -l)"
cat mismatch.txt

I got an error saying :

RudiC · August 7, 2012, 4:16am

Do you need a shell script solution? If not, you might want to try an awk command:

awk 'BEGIN {FS="�";OFS="|"}
    {getline lf1 < "file1"
      if (lf1!=$0) {b=split(lf1,a)
         for (i=1;i<=b;i++) if (a!=$i) print a,$i,NR,i}
    }' file2

yielding

29123975|2923975|3|1

What it does is for each input line of file 2 (in $0) it "getlines" an input line of file1 into variable lf1. If this does not compare to $0, every single field of the two are compared, and a mismatch is printed. This still needs some error checking etc. added.

i150371485 · August 8, 2012, 3:37am

@Rudic : Thanks for the reply and expalnation of the AWk command. I will execute today and i will check the results and let you know ..

---------- Post updated at 01:07 PM ---------- Previous update was at 12:16 PM ----------

@Rudic, I have couple of doubts here. Would you mind explaning me . i have executed couple of scenarios. If i have same number of records in file1 and file2 it is working fine. Please find below points where it is not working .

i have excuted the above awk command by removing first line from file2, but i got whole records as mismatches

Please help . I am using KSH .

RudiC · August 8, 2012, 4:30am

As mentioned in my post, error checking was omitted, and it was based on the assumption of files of equal length.
diff is great at finding missing lines:

diff file1 file2
3c3
< 29123975�2012-05-29�35310126�00000000000469744804�00010�20�390����F
---
> 2923975�2012-05-29�35310126�00000000000469744804�00010�20�390����F
7d6
< 29123979�2012-05-29�35310130�00000000000469744903�00010�20�401����F
8a8
> 29123981�2012-05-29�35310132�00000000000469744929�00010�20�390����F

which you read like line 7 of file1 doesn't exist in file2, and line 8 vice versa.
So you could do a diff first to find out line differences in the files and then execute the awk script to find field differences. To me it seems inadequate to duplicate existing diff functionality using awk.

---------- Post updated at 10:30 AM ---------- Previous update was at 10:12 AM ----------

Actually you could do sth. like

(diff -e file1 file2; echo w)|ed file1

to add missing lines to file1, but you would need to remove the change (e.g. 3c3) commands, e.g. by piping it through sed: |sed '/.c/,+1d'

i150371485 · August 8, 2012, 5:24am

Hi Rudic , Could you please exaplain the significance of "echo w" and "ed" in (diff -e file1 file2; echo w)|ed file1.

RudiC · August 8, 2012, 5:40am

ed is a (text-) file editor (-> man ed), and diff -e is designed to output ed compatible command lines but excluding the final w(rite) command, not to overwrite the file it's working upon. So what we do is create (diff -e + echo w) and execute (ed) the commands in a pipe | to make file1 into file2. If you need to see the deleted records, you might want to execute diff twice - once for the records, once to execute the changes with ed.

i150371485 · August 10, 2012, 1:46am

@Rudic, Thanks very much