Calculate root mean square?

liuzhencc · September 8, 2015, 10:48am

Dear friend,

I know for a single case, this could be finished quickly with Excel. But if we have hundreds of files, we definitely want to do it with a script or a FORTRAN code. Since I have no knowledge of FORTRAN, I tried to work out a script to do it. The math is very simple. we chose one atom (in x, y, z, Cartesian coordinate), and we calculate the distance from other atom to the reference atom with the simple math sqrt((x(i)-x(1))^2+(y(i)-y(1))^2+(z(i)-z(1))^2)
Problem explanation: we have a list of numbers

P               1.219142       0.637315      -0.824280
N              -0.265369       0.551699      -1.665314
P              -1.232957      -0.291982      -0.536350
C               1.550001       2.370502      -0.242803
C               0.756639       2.983923       0.747041
C               1.003518       4.293274       1.173466
C               2.045233       5.026004       0.616725
C               2.835510       4.452541      -0.372444
C               2.587869       3.143957      -0.800209
C               2.645951       0.045340      -1.859172

assume we set the P1 as the center atom, now we need to calculate all the distance of other atoms from P, respectively.
and we print out the results as follows,

N2-P1 P3-P1 C4-P1 C5-P1 C6-P1 C7-P1 C8-P1 C9-P1 C10-P1
1.708 2.638 1.858  2.862 4.172  4.692 4.168 2.856 1.859

Thank you very much in advance!
Zhen

Scrutinizer · September 8, 2015, 11:33am

Hi,

Try:

awk '
  NR==1 { 
    split($0,F)
    next
  }
  { 
    h=h $1 NR "-" F[1]1 OFS
    r=r sqrt(($2-F[2])^2+($3-F[3])^2+($4-F[4])^2) OFS
  }
  END {
    print h ORS r
  }
' CONVFMT="%.3f" file

----

You could try this variation for multiple files:

awk '
  FNR==1 {
    if(NR>1)
      print h ORS r
    split($0,F)
    h=r=x
    next
  }
  {
    h=h $1 FNR "-" F[1]1 OFS
    r=r sqrt(($2-F[2])^2+($3-F[3])^2+($4-F[4])^2) OFS
  }
  END {
    print h ORS r
  }
' CONVFMT="%.3f" file(s)

liuzhencc · September 8, 2015, 12:45pm

Thanks so much for the script. It works like a charm.
Would you please give me some explanation about this script? How does it works so fast. Since you use unformatted print inside awk, so the bond title and bond distance are not aligned with each other if there are hundreds of bond.
for instance, there are 19 bond titles and 24 distances

P2-Cr1 N3-Cr1 P4-Cr1 C5-Cr1 C6-Cr1 C7-Cr1 C8-Cr1 C9-Cr1 C10-Cr1 C11-Cr1 C12-Cr1 C13-Cr1 C14-Cr1 C15-Cr1 C16-Cr1 C17-Cr1 C18-Cr1 C19-Cr1 C20-Cr1
2.270 2.872 2.249 3.526 3.961 5.272 6.070 5.812 4.655 3.494 3.886 5.197 6.016 5.787 4.646 3.486 3.883 5.197 6.017 5.785 4.641 3.490 4.027 5.305

vgersh99 · September 8, 2015, 3:33pm

lining up headers/values for Scrutinizer's code:

BEGIN {
  CONVFMT="%.3f"
}
NR==1 {
    split($0,F)
    next
  }
  {
    ht=sprintf("%-9s",$1 NR "-" F[1]1)
    rt=sprintf("%-9s",sqrt(($2-F[2])^2+($3-F[3])^2+($4-F[4])^2))
    h=(h)? h OFS ht:ht
    r=(r)?r OFS rt:rt
  }
  END {
    print h ORS r
  }

liuzhencc · September 8, 2015, 3:54pm

Great! Now, these two rows are aligned within each column. The output is perfectly formatted.
Would you please explain a little bit on 'split', 'sprintf', 'OFS', 'h ORS r'?
How does it work? It's an advanced script with so many keywords I've never seen before.
Thanks you very much!