To count distinct fields in a row

Abhik · September 16, 2010, 1:36pm

I have . dat file which contains data in a specific format:
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761

i need to compare each field in a row with another field of the same column but different row and cont the diffferences between the rows.
For ex: 892!=921 and 921!=342 and 342!=543 hence the (count of the differences between row 0 and row 1) = 3
Similarly i need to count the differences between the fields of row 1 and row 2
row 2 and row 3.. and soo on(not 0-1 , 2-3, 4-5...)

please can anyone body help me with an awk script?
I used NR.. but it is not pointing back to an already visited row

bartus11 · September 16, 2010, 1:46pm

 awk 'NR==1{split($0,a);next}{for (i in a){if (a!=$i && i>1)b++};print a[1]"-"$1": "b;split($0,a);b=0}' file

Abhik · September 16, 2010, 2:11pm

Thanks for the solution.. Using ur script, i am getting a wrong difference result when comparing with different length columns...
Can u pls explain the above script as i am new to awk

bartus11 · September 16, 2010, 2:24pm

So what should be result of comparing those two lines?

1 3 921 342 543
2 4 817 562 718 765

Abhik · September 16, 2010, 2:26pm

as 921!=817 and 342!=562..
difference is 4

summer_cherry · September 16, 2010, 10:53pm

my $tmp;
sub _comp(@@){
  my $cnt;
  my @a = @{$_[0]};
  my @b = @{$_[1]};
  for (my $i=2;$i<=$#a;$i++){
   $cnt++ if $a[$i] != $b[$i];
  }
  return $cnt;
}
while(<DATA>){
  if($.==1){
    $tmp=$_;
  }
  else{
    my @arr1=split /\s+/, $tmp;
    my @arr2=split /\s+/, $_;
    my $diff = _comp(\@arr1,\@arr2);
    my ($a,$b)=($.-2,$.-1);
    print "Diff between line [$a] and line [$b] is $diff\n";
    $tmp=$_;
  }
}
__DATA__
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829

rdcwayx · September 17, 2010, 12:44am

above perl script's result has issue:

Diff between line [0] and line [1] is 3
Diff between line [1] and line [2] is 3
Diff between line [2] and line [3] is 3
Diff between line [3] and line [4] is 4

Here is mine:

cat infile

0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829


awk '
NR==1{split($0,a);c=NF;next}
{ s=(c>NF)?c-NF:"0";}
{ for (i=3;i<=NF;i++) if (a!=$i) b++ }
{print "Differences between Row", NR-1, "and Row",NR,")=",b+s;split($0,a);b=0;c=NF}
' infile

Differences between Row 1 and Row 2 )= 3
Differences between Row 2 and Row 3 )= 4
Differences between Row 3 and Row 4 )= 3
Differences between Row 4 and Row 5 )= 4

Abhik · September 17, 2010, 11:47am

Thanks a lot.
for the posts.. they do work

---------- Post updated at 11:47 AM ---------- Previous update was at 11:34 AM ----------

@rdcwayx
Can u please explain your script.?
I am not able to understand how are u able to read row 1 twice for comparisons.. I had this problem..

rdcwayx · September 17, 2010, 7:56pm

The previous row was saved in array a which will be used in new row.

split($0,a)

After print, save the current row in array a again.