Hey, not too good at this, so I only managed a clumsy and SLOW solution to my problem that needs a drastic speed up. Any ideas how I write the following in awk only?
Code is supposed to do...
For every line read column values $6, $7, $8 and do a calculation with the same column values of every other line in the same file. If conditions are met, write information out to file.
CODE:
while read line; do
XI=$(echo $line | awk '{print $6}')
YI=$(echo $line | awk '{print $7}')
ZI=$(echo $line | awk '{print $8}')
ATOM_TYPE=$(echo $line | awk '{print $3}')
awk -v xi="$XI" -v yi="$YI" -v zi="$ZI" -v atom="$ATOM_TYPE" -v cutoff="$DISTCUT" '{dist=sqrt(( xi- $6)^2 + ( yi- $7)^2 + ( zi- $8)^2); if (dist <= cutoff && dist != '0') print atom, $3, dist}' sub_oxy_high >> oxy_dist_all
done < sub_oxy_high
INPUT:
ATOM 5202 C3 TB 347 47.749 6.795 193.827
ATOM 5203 C4 TB 347 46.729 7.915 193.597
ATOM 5204 O5 TB 347 47.109 9.075 193.407
ATOM 5205 O6 TB 347 45.329 7.594 193.517
...
OUTPUT:
C3 C4 9.999
C3 O5 9.999
C3 O6 9.999
...
And what's the value of DISTCUT
for the output posted?
Try:
awk '{atom[NR]=$3;xi[NR]=$6;yi[NR]=$7;zi[NR]=$8}
END{
for(i=1;i<=NR;i++)
for(j=1;j<=NR;j++)
{
if(j==i) continue
dist=sqrt((xi-xi[j])^2 + (yi-yi[j])^2 + (zi-zi[j])^2)
if(dist!=0 && dist<=cutoff)
print atom,atom[j],dist
}
}' cutoff="$DISTCUT" sub_oxy_high > oxy_dist_all
1 Like
RudiC
October 2, 2012, 6:55am
3
awk '{for (i=3;i<=NF;i++) TMP[NR,i]=$i}
END {for (i=1;i<=NR;i++)
{for (j=NR;j>i;j--)
{dist = sqrt ( (TMP[i,6]-TMP[j,6])^2 + (TMP[i,7]-TMP[j,7])^2 + (TMP[i,8]-TMP[j,8])^2 );
if (dist != 0 && dist <= co) print TMP[i,3],TMP[j,3],dist
}
}
}
' co="$DISTCUT"
With the data from your example:
C3 O6 2.56728
C3 O5 2.40508
C3 C4 1.53222
C4 O6 1.43856
C4 O5 1.23535
O5 O6 2.31816
@elixir_sinari : too fast for me! But - you're outputting each pair of atoms twice; not sure if that's desired...
1 Like
Is it? But, then that's a "faithful" conversion of that loop to an awk script.
C3 C4 1.53222
C3 O5 2.40508
C3 O6 2.56728
C4 C3 1.53222
C4 O5 1.23535
C4 O6 1.43856
O5 C3 2.40508
O5 C4 1.23535
O5 O6 2.31816
O6 C3 2.56728
O6 C4 1.43856
O6 O5 2.31816
is the output for the sample.
1 Like
RudiC
October 2, 2012, 7:05am
5
elixir_sinari:
Is it?
Yes: e.g.
C3 C4 1.53222
C4 C3 1.53222
But maybe that's desired?
1 Like
If it is not desired, a slight tweak will do the trick.
awk '{atom[NR]=$3;xi[NR]=$6;yi[NR]=$7;zi[NR]=$8}
END{
for(i=1;i<=NR;i++)
for(j=i+1;j<=NR;j++)
{
dist=sqrt((xi-xi[j])^2 + (yi-yi[j])^2 + (zi-zi[j])^2)
if(dist!=0 && dist <=cutoff)
print atom,atom[j],dist
}
}' cutoff="$DISTCUT" sub_oxy_high > oxy_dist_all
1 Like
You guys are awesome, thanks all around... Double entries were not desired, I just left the issue out because I didn't want to cause confusion.
DISTCUT=3.5 by the way, a geometric hydrogen bonding criterion in angstrom...
This forum is so good