Taking the averages of columns with deletion of some lines

begin_shell · May 12, 2013, 2:34pm

Hi,

I am in stage of post processing some of my results. I wanted to plot the data against the three axis x,y,z. The data file is quite complicated and i have to take the average of x, y,z over different steps of my test. A typical file look like below:

Time taken:4s
No.of series : 3
Step:1
ID   X       Y      z
1    0.1    0.4   0.45
2    1.2   -1.2   0.25
3   -0.5   -1.2   0.26
3    0.8   -2.1   1.45
1    1.2   -8.2   0.25
2    0.25   1.2   0.25
Time taken:5s
No.of series : 3
Step:2
ID   X       Y      z
3    1.0    0.1    0.5
1   -2.0   -0.2    0.25
2   -0.4   -0.2   -0.60
2   -0.1   -1.1   -0.45
3    0.2   -0.2    0.25
1    1.1   -1.2    0.25

What i wanted to do?
From the above data, i wanted to get the average of values of X , Y , Z (column 2,3,4) along with Column 1 with title as their corresponding step number (i.e) Step.

The thing is that original file is so big that the number of IDs here shown as 3 is actually 600 and the steps are around 25000 which is shown here as two steps.I wanted to delete these following lines (which is displayed inbetween the steps ) while doing this automatically :

Time taken:5s
No.of series : 3

Expected Output : I am in need of an output as a file, which exactly looks like this:

Step:1 
ID    X         Y        z
1    0.65    -3.9    0.35
2    0.725    0      0.25
3    0.15    -1.65  0.855
Step:2
ID   X         Y      z
1    -0.45  -0.7     0.25
2   -0.25   -0.65   -0.525
3    0.6     -0.05    0.375

I would much thankfull for your immediate inputs on making a script for achieving the above result.

bartus11 · May 12, 2013, 2:59pm

Put this in "script.awk":

/^Step/
/^ID/
/^Time/ && p {
    for (i in c) {
      print i" "x/c" "y/c" "z/c
      c=0
      x=0
      y=0
      z=0
    }
}
/^[0-9]/ {
  c[$1]++
  x[$1]+=$2
  y[$1]+=$3
  z[$1]+=$4
  p=1
}
END {
  for (i in c) {
    print i" "x/c" "y/c" "z/c
  }
}

Then run:

awk -f script.awk datafile

begin_shell · May 12, 2013, 3:13pm

hi thanks for your reply. But it works properly but my intention is NOT fulfilled. the script displays the average for x,y,z columns all over the file. But, what i need to find the Averages of X,Y,Z at particular steps separately and save in one file as shown in EXPECTED OUTPUT.

Here , when i run the script, it display only one set of X,Y,Z average values for both the steps combined. But, i wanted to get the averages at step 1 and a separate average at step2.

bartus11 · May 12, 2013, 3:19pm

Can you post the output that you are getting? I get something like this:

# awk -f script.awk datafile
Step:1
ID   X       Y      z
1 0.65 -3.9 0.35
2 0.725 0 0.25
3 0.15 -1.65 0.855
Step:2
ID   X       Y      z
1 -0.45 -0.7 0.25
2 -0.25 -0.65 -0.525
3 0.6 -0.05 0.375

begin_shell · May 12, 2013, 3:29pm

Hi,
i am extremely sorry. I get the same output as you got for the data i pasted. But when i run for actual data , it takes all averages for all timesteps. The actual data which looks like this :

SOURCE: TIMESTEP
1
SOURCE: NUMBER OF SERIES
594
ITEM: SHAPE CONSTRAIN pp pp pp
-10 30
-10 30
-10 30
ITEM: SERIES id type x y z 
300 1 -0.018638 -2.55234 -6.32752 
100 1 1.41254 -2.58375 -6.49904 
200 1 -2.83075 0.0568085 -6.31853 
100 1 -2.13227 -1.15906 -6.22779 
200 1 -0.72638 -1.26406 -6.59486 
300 1 0.0490859 0.005266 -6.90442 
SOURCE: TIMESTEP
2
SOURCE: NUMBER OF SERIES
594
ITEM: SHAPE CONSTRAIN pp pp pp
-10 30
-10 30
-10 30
ITEM: SERIES id type x y z 
100 1 -0.458893 -1.85236 -6.76382 
200 1 2.44809 -1.75353 -6.94784 
300 1 1.69774 -0.511574 -6.93282 
300 1 -2.26969 -5.05726 -4.42174 
200 1 0.629752 -5.01888 -4.87928 
100 1 2.07906 -4.80346 -4.90814

Issue : When i run the script for this, it takes average of X,Y,Z for all the timestep. I would be really thankful, if I get separate X,Y,Z averages for corresponding TIMESTEPs.

And i do not want this get displayed in the output:

SOURCE: NUMBER OF SERIES
594
ITEM: SHAPE CONSTRAIN pp pp pp
-10 30
-10 30
-10 30

bartus11 · May 12, 2013, 3:48pm

Try this:

/^SOURCE: TIMESTEP/ {
  if (p) {
    for (i in c) {
      print i" "x/c" "y/c" "z/c
      c=0
      x=0
      y=0
      z=0
    }
  }
  print
  getline
  print
}
/^[0-9]/ && NF==5 {
  c[$1]++
  x[$1]+=$3
  y[$1]+=$4
  z[$1]+=$5
  p=1
}
END {
  for (i in c) {
    print i" "x/c" "y/c" "z/c
  }
}

begin_shell · May 12, 2013, 3:58pm

thank you so much. You have really helped me a lot. I would be grateful , if it is possible for you to explain shortly how it works.