awk script automation

I have the below code which calculates the time difference between src and dst from a large trace file. The code works for a given source and destination. However, I want to automate the code to go over any src and destination. The format of the source is like that: X.Y where x is always =2 and Y is varying (i.e. 0,1,2,3,4,..). The format of destination: Z.T where Z and T varying (could be 3.5, 5.7, 5.100, .....).

BEGIN {
  src="2.2431"; dst="5.20"; 
  num_samples = 0;
  total_delay = 0;
}
/^\+/&&$9==src&&$10==dst {
    t_arr[$12] = $2;
};
/^r/&&$9==src&&$10==dst{
    if (t_arr[$12] > 0) {
      num_samples++;
      delay = $2 - t_arr[$12];
    total_delay += delay;
    };
};
END{
  avg_delay = total_delay/num_samples;
  print "Average end-to-end transmission delay is " avg_delay  " seconds";
  print "Measurement details:"; 
  print "  - Since packets are created from the address " src;
  print "  - Until the packets are destroyed at the address " dst;
};

My PROBLEM is how to automate the code and make the condition something like:
/^\+/&&$9==(2.anyNumber)&&$10==(AnyNumber.AnyNumber)

sample of the input file :

        
         + 0.163944   2    1      a     40  -------       1      2.4      5.4     0    10
        + 0.215400    2    1      a     40  -------       1      2.4      5.4     1    28
        + 0.239528    2    1      t     40  -------        1      2.4      5.4    0    37
        + 0.287784    2    1      t     1040  -------     1      2.4      5.4    1    62
        + 0.287784    2    1      t     1040  -------     1      2.4      5.4    2    63
        - 0.147407    1     0     a     40 -------         1      2.1      5.1    0     6
        r 0.148256     0    5     a     40 -------          1     2.0       5.0    0     2
       + 0.148256     5    0      t     1040 -------      1      5.0      2.0    1     7
       + 0.166969     5    0      t     40 -------         1      5.5      2.5     0     11
        - 0.166969     5    0      t    40 -------          1      5.5      2.5    0    11
        r 0.188072     0    5      a    40 -------          1      2.4      5.4    0    10
        r 0.239528     0    5      a    40 -------          1      2.4      5.4    1    28
        r 0.263656     0    5      t    40 -------          1      2.4      5.4     0    37
        r 0.317128     0    5      t    1040 -------       1      2.4      5.4     1    62
        r 0.318792     0    5      t    1040 -------       1      2.4      5.4     2    63

Any suggestions!!!

$9 : is the src value (X.Y)
$10: is the dst value(Z.T)

$2 : the time .
$12 is the ID of the packet which unique from src to destination. Since there are middle nodes between src-->destination, the ID might be seen repeated, that's why I taking the average.

the expected output:

average time from src: 2.Y -->Z.T :   value seconds

show your data and expected output what is $2 ,$9 ,$10 and $12 ?

Check my update !! Thank for the point

There are some duplicates in your file..

whether you wanted like this ?

$ cat sample
+ 0.163944 2 1 a 40 ------- 1 2.4 5.4 0 10
+ 0.215400 2 1 a 40 ------- 1 2.4 5.4 1 28
+ 0.239528 2 1 t 40 ------- 1 2.4 5.4 0 37
+ 0.287784 2 1 t 1040 ------- 1 2.4 5.4 1 62
+ 0.287784 2 1 t 1040 ------- 1 2.4 5.4 2 63
r 0.188072 0 5 a 40 ------- 1 2.4 5.4 0 10
r 0.239528 0 5 a 40 ------- 1 2.4 5.4 1 28
r 0.263656 0 5 t 40 ------- 1 2.4 5.4 0 37
r 0.317128 0 5 t 1040 ------- 1 2.4 5.4 1 62
r 0.318792 0 5 t 1040 ------- 1 2.4 5.4 2 63
$ cat test.sh
#!/bin/bash

if [ -z "$*" ];then echo "No argument exiting.."; exit;fi

awk -v src=$2 -v dst=$3 '
BEGIN {
      num_samples = 0;
      total_delay = 0;
      }

$1 ~ /^\+/ && $9==src && $10==dst {
                        t_arr[$12] = $2;
                       }

$1 ~ /^r/  && $9==src && $10==dst {
                        if (t_arr[$12] > 0) {
                                      num_samples++
                                      delay = $2 - t_arr[$12]
                                    total_delay += delay
                                    }
                          }
END{
  if(num_samples){
            avg_delay = total_delay/num_samples
            print "Average end-to-end transmission delay is " avg_delay  " seconds"
            print "Measurement details:" 
            print "  - Since packets are created from the address " src
            print "  - Until the packets are destroyed at the address " dst
         }
  else
        {
         print "Not Found..."
        }
}' $1

Usage

$ bash script.sh filename source destination
$ bash test.sh sample 2.4 5.4
Average end-to-end transmission delay is 0.0265472 seconds
Measurement details:
  - Since packets are created from the address 2.4
  - Until the packets are destroyed at the address 5.4

Actually, the src and dst should be read from the file itself, I don't want to pass the values because there are many like : 2.1, 2.2, 2.3, ......, 2.1489. and the destination has a combination also. the modification is can done to the condition :

$1 ~ /^\+/ && $9==src && $10==dst ..... where the value of the src & dst should be extracted from the input file. I think the way to do probably is the regular expression which I'm newbie to. It should be something like :

/^\+/&&$9==/^2.* && $10== anything.. I will update the above sample

do like this then

$1 ~ /^\+/ && $9~/^2\..*/ && $10

Okay, the condition seems to work ( not sure yet ) but how do I make the code to print the time for each src--->dest. What I'm getting from code I think the last src->dst time. I tried to include the printing before the END block, still getting one value.... I would expect :

Average end-to-end transmission delay is src 2.y to dst1 --> 0.1 seconds 
Average end-to-end transmission delay is src 2.y to dst2 --> 0.4 seconds
Average end-to-end transmission delay is src 2.y to dst3 --> 0.2 seconds

and so on

It should be done before END block. As you know In END block you are printing so..when FNR or NR reaches last line it will get printed so you get output only once.

Yes I agree but I'm still getting one sentence. Also, how do we get hold of the src that we are calculating for. So in the print statement, we can use the value. I think we need to add the each src & dst in the array and have a for loop in the END block !!!. Or use key map ????

---------- Post updated at 01:20 PM ---------- Previous update was at 12:59 PM ----------

BEGIN {
  
  num_samples = 0;
  total_delay = 0;
  avg_delay=0;
}


/^\+/ && $9~/^2\..*/ && $10 {
    t_arr[$12] = $2;
};



/^r/&& $9~/^2\..*/ && $10{
    if (t_arr[$12] > 0) {
      num_samples++;
      delay = $2 - t_arr[$12];
    total_delay += delay;
    };
};

  avg_delay = total_delay/num_samples;
 
  print "Average end-to-end " avg_delay ;

  
  
  
END{
  #avg_delay = total_delay/num_samples;
  #print "Average end-to-end transmission delay is --> " avg_delay  " seconds";
 
};

Getting syntax error !!!

Here you defined END block what is there inside except comment ??
so this is one reason for syntax error

Have a look here

$ echo Demo | awk 'END{print}'
Demo
$ echo Demo | awk 'END{#print}' 
awk: cmd. line:1: END{#print}
awk: cmd. line:1:      ^ syntax error

I am bit confused about the input and the output you expect, and also I didn't understand relationship with input you provided. And I noticed that you are again and again modifying your post..if you want to print soon after num_samples becomes non zero you can try like this, which prints as you expected in #7 post

awk '

$1 ~ /^\+/ && $9~/^2\..*/ && $10 {
                                          t_arr[$12] = $2;
                                  }

$1 ~ /^r/  && $9~/^2\..*/ && $10   {
                        if (t_arr[$12] > 0) {
                                      num_samples++
                                      delay = $2 - t_arr[$12]
                                      total_delay += delay
                                    }
                            }
num_samples{
            avg_delay = total_delay/num_samples
            print "Average end-to-end transmission delay", $9, "to", $10, "is", avg_delay  " seconds"
            num_samples=0         
        
           }' file
Average end-to-end transmission delay 2.4 to 5.4 is 0.024128 seconds
Average end-to-end transmission delay 2.4 to 5.4 is 0.048256 seconds
Average end-to-end transmission delay 2.4 to 5.4 is 0.072384 seconds

Well, First of all I'd like to thank you for being patient and helpful.
I have updated my post many times based on your questions( thank you) to make obvious and clear.

Now, the latest code you have provided is close to what I'm after. However, this is what I have noticed:

1) The time difference (avg_delay) is increasing linearly which means we need to reset . I did reset total_delay=0 after resetting the num_samples.

2) When I used avg_delay, I was trying to get the average for every unique communication. The unique communication is defined when unique source -->unique destination. for example after I modified the code I got something like:

Average end-to-end transmission delay 2.5 to 6.5 is 4.34506 seconds
Average end-to-end transmission delay 2.5 to 6.5 is 4.37405 seconds
Average end-to-end transmission delay 2.6 to 6.6 is 4.40288 seconds
Average end-to-end transmission delay 2.6 to 6.6 is 4.43337 seconds
Average end-to-end transmission delay 2.7 to 6.7 is 4.46386 seconds
Average end-to-end transmission delay 2.7 to 6.7 is 4.49602 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.52335 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.55234 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.58133 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.61199 seconds
Average end-to-end transmission delay 2.8 to 6.8 is 4.64062 seconds
Average end-to-end transmission delay 2.8 to 6.8 is 4.67091 seconds

If the code was calculating correctly, I shouldn't get :

Average end-to-end transmission delay 2.0 to 6.0 is 4.52335 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.55234 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.58133 seconds
Average end-to-end transmission delay 2.0 to 6.0 is 4.61199 seconds

, instead I should get (4.55234 + 4.58133 +4.61199 )/3 , where 3 is number of samples and final line should look like:

Average end-to-end transmission delay 2.0 to 6.0 is 4.581887

Generally, I have two main conditions:
condition 1: if the condition true, the current time ( $2) will be recorded in an array[$12]. The index of the array is $12 which a unique ID for each unique communication. Therefore, I will end up for time recorded for each unique communication.

Condition2: will check the time or the unique ID and subtract it from the current time.

Total delay is incremented by the calculated delay ( assuming it is the same unique communication. I code worked if I specified the src and dst, so I'm assuming there would no change in the logic in case of automation.

I hope I make it clear !! this time