Compare two files based on integer part only

Please see how can I do this:

File A (three columns):

X1,Y1,1.01
X2,Y2,2.02
X3,Y3,4.03

File B (three columns):

X1,Y1,1
X2,Y2,2
X3,Y3,4.0005

Now I have to compare file A and B based on the integer part of column 3. Means first 2 rows should be OK and the third row should not satisfy the criteria. First two columns make a unique row in one file so no row will be repeated in a file. Same first two columns will be in both the files....means if we can build a logic to compare the integer part of third column for each row (based on column 1 and 2). Thanks.

Why not ?
Integer part of row 3, column 3 in file A = int(4.03) = 4
Integer part of row 3, column 3 in file B = int(4.0005) = 4

So as per your logic, row nos. 3 in both files should be considered a match.

tyler_durden

Sorry. Yes you are right. Please consider the layout of files as follows:
File A (three columns):

X1,Y1,1.01
X2,Y2,2.02
X3,Y3,4.03

File B (three columns):

X1,Y1,1
X2,Y2,2
X3,Y3,5.0005

Here's an idea -

$ 
$ 
$ cat filea
x1,y1,1.01
x2,y2,2.02
x3,y3,4.03
x4,y4,7.0001
x5,y5,9.9997
$ 
$ 
$ cat fileb
x1,y1,1
x2,y2,2
x3,y3,5.0005
x4,y4,7.9998
x5,y5,4.0003
$ 
$ 
$ awk -F, 'NR==FNR {x[NR]=$0}
           NR!=FNR {split(x[FNR],a,",");
                    if(int(a[3]) != int($3)) {printf("ROW %d\n< %s\n---\n> %s\n",FNR,x[FNR],$0)}
                   }' filea fileb
ROW 3
< x3,y3,4.03
---
> x3,y3,5.0005
ROW 5
< x5,y5,9.9997
---
> x5,y5,4.0003
$ 
$ 
$ 

tyler_durden

1 Like
awk -F \. '{a=$1;b=$0 ;getline< "fileb"}{if ($1!=a)print b "|" $0}' filea
1 Like

..............

---------- Post updated at 01:25 PM ---------- Previous update was at 01:24 PM ----------

Following suggested command is getting integer part based on the decimal(.):

awk -F \. '{a=$1;b=$0 ;getline< "fileb"}{if ($1!=a)print b "|" $0}' filea

Actually in my case column 1 and column 2 also have dot(.) so this command is not returning correct values. I have to compare on column 3 only. My files are as follows:

filea

X1.T1,Y1,1.01
X2,Y2.T2,2.02
X3.T3,Y3.T4,4.03

fileb

X1.T1,Y1,1
X2,Y2.T2,2
X3.T3,Y3.T4,5.03

Need to compare integer value of column 3 only.

$
$
$ cat filea
X1.T1,Y1,1.01
X2,Y2.T2,2.02
X3.T3,Y3.T4,4.03
$
$ cat fileb
X1.T1,Y1,1
X2,Y2.T2,2
X3.T3,Y3.T4,5.03
$
$
$ awk -F, 'NR==FNR {x[NR]=$0}
           NR!=FNR {split(x[FNR],a,",");
                    if(int(a[3]) != int($3)) {printf("ROW %d\n< %s\n---\n> %s\n",FNR,x[FNR],$0)}
                   }' filea fileb
ROW 3
< X3.T3,Y3.T4,4.03
---
> X3.T3,Y3.T4,5.03
$
$

tyler_durden

I tried this one and it returns
I have tried this but it does not work on actual code. I have sent you one line of real data in your private message and this code fails even if you make both the files same.

Post your real data over here.

tyler_durden

Even the files are same but code shows that there is a difference (it is tab delimited):

filea

Mechanical.Markdown.Directed.POS.$ WK17 10.5

fileb

Mechanical.Markdown.Directed.POS.$ WK17 10.5
awk -F \t 'NR==FNR {x[NR]=$0} NR!=FNR {split(x[FNR],a,"\t"); if(int(a[3]) != int($3)) {printf("ROW %d\n< %s\n",FNR,x[FNR],$0)} }' filea fileb

Nope, it doesn't. Check this out -

$ 
$ 
$ cat filea
Mechanical.Markdown.Directed.POS.$    WK17    10.5
$ 
$ cat fileb
Mechanical.Markdown.Directed.POS.$    WK17    10.5
$ 
$ ## show the contents of these files with ^I for TAB characters and $ for end-of-line
$ cat -et filea
Mechanical.Markdown.Directed.POS.$^IWK17^I10.5$
$ 
$ cat -et fileb
Mechanical.Markdown.Directed.POS.$^IWK17^I10.5$
$ 
$ 
$ ## now try the awk script, tweaked a little bit so that it displays a message for lines that match
$ awk -F"\t" 'NR==FNR {x[NR]=$0}
              NR!=FNR {split(x[FNR],a,"\t");
                       if(int(a[3]) != int($3)) {printf("ROW %d\n< %s\n---\n> %s\n",FNR,x[FNR],$0)}
                       else {print "ROW ",FNR,"is the same in both files"}
                      }' filea fileb
ROW  1 is the same in both files
$ 
$ 
$ 
$ ## now try the other case - edit one file so that the last field is different
$ 
$ sed 's/10.5/11.5/' filea >tmp && mv tmp filea
$ 
$ ## check the contents of both files again
$ cat filea
Mechanical.Markdown.Directed.POS.$    WK17    11.5
$ 
$ cat fileb
Mechanical.Markdown.Directed.POS.$    WK17    10.5
$ 
$ ## finally, try the awk script once again
$ awk -F"\t" 'NR==FNR {x[NR]=$0}
              NR!=FNR {split(x[FNR],a,"\t");
                       if(int(a[3]) != int($3)) {printf("ROW %d\n< %s\n---\n> %s\n",FNR,x[FNR],$0)}
                       else {print "ROW ",FNR,"is the same in both files"}
                      }' filea fileb
ROW 1
< Mechanical.Markdown.Directed.POS.$    WK17    11.5
---
> Mechanical.Markdown.Directed.POS.$    WK17    10.5
$ 
$ 

If your results are different, then my best guess is that either one or both the files aren't truly tab-delimited.
Check the octal dump of each file to see what exactly is in there.

od -bc filea
od -bc fileb

tyler_durden

1 Like

With new input:

awk -F, '
{split($3,x,".");a=$1 FS $2 FS x[1] ;b=$0}
{getline < "fileb" ; split ($3,y,".");}
{if (a!=$1 FS $2 FS y[1]) print b "|" $0}
' filea

X3.T3,Y3.T4,4.03|X3.T3,Y3.T4,5.03
1 Like

Thanks durden_tyler and rdcwayx.