Hello Gyues!
I would like to use awk to perform data extraction from several files. The data files look like this:
DWT26R 1 PEP1 CA 1 OH2 SKIPPED: 0 STEP: 1
0.29000E+01 0.55005E-02 0.60012E-03
0.30000E+01 0.11149E+00 0.13603E-01
0.31000E+01 0.39719E+00 0.63013E-01
0.32000E+01 0.94264E+00 0.18784E+00
0.33000E+01 0.17744E+01 0.43749E+00
0.35000E+01 0.32350E+01 0.13273E+01
0.36000E+01 0.34913E+01 0.19104E+01
.
.
.
The first line is unique for each file and contains information I would like to add to the output. In fact, I need to seach for the highest value in $2 and print it together with the the first line of that file. Then the next file needs to be processed the same way.
For A single file it works fine though but how can I do this with multiple files? I think I somehow need to assigne information from the unique first line to the values of each file and store it in an array. At the end I simply need to print that array containing these information... However I really could not get it work so far...
The current code that works for a single file is:
BEGIN {
print "trajectory= traj molecules= mol Peptide= pep resid(CA?)= res contact= so (max)solv/sphere= n Radius(A)= r";
print "traj", "mol", "pep", "res", "co", "n", " r"; #just a header for the output
}
# need to read substring in order to get exponential funktion
{
if (NR==1) {
expo=0;
coomp=0;
co=0;
max=0;
maxline=0;
traj=$2;
mol=$1;
pep=$3;
res=$5;
so=$6;
} #saving file information and resetting comparison set
else {
expo=10^(substr($2,9,3)); #extract exponent
comp=(substr($2,3,5)/100000);
co=comp*expo;
if (co > max) {max=co; maxline=substr($1,3,5)/100000*10^(substr($1,9,3))} # extract highest value from file
}
}
END {
print traj, mol, pep, res, so, max, maxline; #print highest value and information from the first line
}
Hope you gyues can help me out.
Cheers,
Daniel