Problem using function in awk

I created two functions that output two random variables. I want to output them in the output file. But it does not seem to work.

# Function rgaussian1(r1, r2)
# Gaussian random number generator
  function rgaussian1(r1, r2) {
      pi = 3.142
      v1 = sqrt( -2 * log(rand()) )
      v2 = 2 * pi * rand()
      r1 = a * sin(b)
      r2 = a * cos(b)
  }

# Function rgaussian2(r1, r2)
# Gaussian random number generator
  function rgaussian2(r1, r2) {

      do {
          v1 = 2 * rand() - 1
          v2 = 2 * rand() - 1
          rsq = v1 * v1 + v2 * v2
      } while (rsq > 1)

      fac = sqrt(-2 * log(rsq) / rsq)
      r1 = v2 * fac
      r2 = v1 * fac

  }

# Include gaussian distributed random numbers
  NF == 2 {
      rgaussian1(r1, r2)
      rgaussian2(r3, r4)
      print $0,$2,r1,r2,r3,r4
  }
$ 
$ awk '
function rgaussian1(r1, r2)
{      pi = 3.142
      v1 = sqrt( -2 * log(rand()) )
      v2 = 2 * pi * rand()
      r1 = a * sin(b)
      r2 = a * cos(b)
      print r1, r2
}
function rgaussian2(r1, r2)
{
      do {
          v1 = 2 * rand() - 1
          v2 = 2 * rand() - 1
          rsq = v1 * v1 + v2 * v2
      } while (rsq > 1)
      fac = sqrt(-2 * log(rsq) / rsq)
      r1 = v2 * fac
      r2 = v1 * fac
      print r1, r2
}
BEGIN {
  rgaussian1(0.556, 0.677);
  rgaussian2(0.799,0.808)
}'
0 0
-0.196896 0.195777
$ 
$ 

tyler_durden

A couple of things that might be causing you problems:

1) I assume that you want to invoke your functions with the values from the input line ($1 and $2) and not r1/r2 which are not defeined.

2) If the result of your computation is 0, print will assume the empty string and not 0 and print a blank. print r1+0 will force it to print a number.

Another thing to note is that the logic of these functions is such that they return the same values for different arguments.

$ 
$ 
$ awk '
function rgaussian1(r1, r2)
{      pi = 3.142
      v1 = sqrt( -2 * log(rand()) )
      v2 = 2 * pi * rand()
      r1 = a * sin(b)
      r2 = a * cos(b)
      print r1, r2
}
function rgaussian2(r1, r2)
{
      do {
          v1 = 2 * rand() - 1
          v2 = 2 * rand() - 1
          rsq = v1 * v1 + v2 * v2
      } while (rsq > 1)
      fac = sqrt(-2 * log(rsq) / rsq)
      r1 = v2 * fac
      r2 = v1 * fac
      print r1, r2
}
BEGIN {
  rgaussian1(200, 200);
  rgaussian2(0.1254544647, 0.57385902382745)
}'
0 0
-0.196896 0.195777
$ 
$ 

With some diagnostics added:

$ 
$ awk '
function rgaussian1(r1, r2)
{
  print "BEGIN rgaussian1 =>",r1, r2, r1+r2, r1*r2;
  pi = 3.142
  v1 = sqrt( -2 * log(rand()) )
  v2 = 2 * pi * rand()
  r1 = a * sin(b)
  r2 = a * cos(b)
  #print r1, r2
  print "END rgaussian1 =>",r1, r2, r1+r2, r1*r2;
}
function rgaussian2(r1, r2)
{
  print "BEGIN rgaussian2 =>",r1, r2, r1+r2, r1*r2;
  do {
    v1 = 2 * rand() - 1
    v2 = 2 * rand() - 1
    rsq = v1 * v1 + v2 * v2
  } while (rsq > 1)
  fac = sqrt(-2 * log(rsq) / rsq)
  r1 = v2 * fac
  r2 = v1 * fac
  #print r1, r2;
  print "END rgaussian2 =>",r1, r2, r1+r2, r1*r2;
}
BEGIN {
  rgaussian1(100, 100);
  rgaussian2(0.1254544647, 0.57385902382745)
}'
BEGIN rgaussian1 => 100 100 200 10000
END rgaussian1 => 0 0 0 0
BEGIN rgaussian2 => 0.125454 0.573859 0.699313 0.0719932
END rgaussian2 => -0.196896 0.195777 -0.00111973 -0.0385477
$ 
$ 

tyler_durden

Input arguments don't matter. They only return values from the function. I thought I would need to pass them through in order to return the values out from the function. I thought of using "return r1" for example. Calling srand() within the BEGIN statement should solve the problem of always getting the same values.

I believe you could use the return statement, but it can return only one value.

$ 
$ awk '
function rgaussian2(r1, r2)
{
  do {
    v1 = 2 * rand() - 1
    v2 = 2 * rand() - 1
    rsq = v1 * v1 + v2 * v2
  } while (rsq > 1)
  fac = sqrt(-2 * log(rsq) / rsq)
  r1 = v2 * fac
  r2 = v1 * fac
  return r1
}
BEGIN {
  print rgaussian2(1.234586, 8.576848)
}'
-0.787923
$ 
$ 

Alternatively, you could return a formatted string with values of r1 and r2, and print them in your function call.

$ 
$ 
$ awk '
function rgaussian2(r1, r2)
{
  do {
    v1 = 2 * rand() - 1
    v2 = 2 * rand() - 1
    rsq = v1 * v1 + v2 * v2
  } while (rsq > 1)
  fac = sqrt(-2 * log(rsq) / rsq)
  r1 = v2 * fac
  r2 = v1 * fac
  return sprintf("%.20f %.20f",r1,r2)
}
BEGIN {
  x = "Play it again, Sam"
  y = 999
  printf("%s %d %s\n", x, y, rgaussian2(1.234586, 8.576848))
}'
Play it again, Sam 999 -0.78792338859613808566 -0.98884380096718782482
$ 
$ 

tyler_durden

Yes, I think you are right. Can only return one value. But it does not matter in my case. Since they are both random numbers I can just use one of them.

---------- Post updated at 03:33 PM ---------- Previous update was at 02:47 PM ----------

I have this awk script, and I output mean and sigma, but the result is not what I assigned to mean and sigma.

 
 function rgaussian1() {
      a = rand()
      b = rand()
      c = rand()
      r1 = a + b + c
      return r1
  }

  function rgaussian11(mean, sigma) {
      a = rgaussian1()
      b = mean + (a * sigma)
      print "VALUES: ",mean,sigma
      return b
  }

  NF == 2 {
      a = 5
      b = 0.5
      r1 = rgaussian1()
      r2 = rgaussian11(a,b)
      print $0,$2,r1,r2
  }

As you can see below I am not getting the values of mean and sigma that I have passed to the function.

VALUES:  0.319071 0.692849
10 0  0 1.50904 -0.654771 1.0627 0.833281
VALUES:  0.244565 0.468954
13 5.74484  5.74484 2.47868 1.11852 0.990984 -0.889183
VALUES:  0.787803 0.202516
16 10.1769  10.1769 0.570419 -0.736454 1.48426 1.09622
VALUES:  0.187611 0.611989
19 13.8527  13.8527 1.40469 -0.309851 0.53924 -1.40168
VALUES:  0.669174 0.450629
22 16.8957  16.8957 1.38384 0.0894231 -0.804437 -0.168806
VALUES:  0.523561 0.360272
25 19.3552  19.3552 1.05967 1.17779 2.0299 -0.459928
VALUES:  0.749601 0.872226
28 21.3932  21.3932 1.02497 -0.17011 -1.12271 0.916729
VALUES:  0.39957 0.0115686
31 23.0869  23.0869 0.713036 -0.15845 -0.852606 -0.867672
VALUES:  0.488232 0.786265
34 24.5867  24.5867 1.38646 0.907416 0.215826 -0.207289
VALUES:  0.721516 0.283772
37 25.9775  25.9775 1.2652 0.18478 0.643813 1.46849
VALUES:  0.796942 0.08114
40 27.2779  27.2779 1.07348 2.44546 -0.369929 0.784368
VALUES:  0.32347 0.847031
43 28.5486  28.5486 1.86091 -0.337079 0.683475 0.124624
VALUES:  0.858226 0.607827
46 29.754  29.754 1.81017 0.697858 -0.547066 1.12782
VALUES:  0.096079 0.495649
49 30.9075  30.9075 1.82609 1.88842 -0.843563 0.535018
VALUES:  0.601441 0.696511
52 32.0179  32.0179 1.4897 0.233247 -0.0728211 -1.17591
VALUES:  0.460437 0.181952
55 33.0682  33.0682 0.657609 -0.594157 -0.79817 -0.281863
VALUES:  0.630648 0.42759
58 34.082  34.082 1.48051 0.869132 0.319669 -0.426564
VALUES:  0.446626 0.770096
61 35.0402  35.0402 1.70818 -0.775314 -0.0754717 -0.160226
VALUES:  0.127291 0.985517
64 35.9725  35.9725 1.59898 -1.89246 -0.441951 0.257042
VALUES:  0.970425 0.0665038

It might because in function rgaussian11 the variables mean and sigma are local variables, however, a and b are defined in the function body and therefore global, so your values outside and inside the function get mixed up...

Can't reproduce your error. It works as expected for me.

$ 
$ 
$ cat gaussian1.sh
##
awk '
  function rgaussian1() {
      a = rand()
      b = rand()
      c = rand()
      r1 = a + b + c
      return r1
  }
  function rgaussian11(mean, sigma) {
      print "(1)  VALUES: ",mean,sigma
      a = rgaussian1()
      b = mean + (a * sigma)
      print "(2)  VALUES: ",mean,sigma
      return b
  }
  BEGIN {
      srand()
      a = 5
      b = 0.5
      x = rgaussian11(a, b)
      print x
  }'
$ 
$ 
$ . gaussian1.sh
(1)  VALUES:  5 0.5
(2)  VALUES:  5 0.5
5.73322
$ 
$ 
$ . gaussian1.sh
(1)  VALUES:  5 0.5
(2)  VALUES:  5 0.5
5.99618
$ 
$ . gaussian1.sh
(1)  VALUES:  5 0.5
(2)  VALUES:  5 0.5
6.01297
$ 
$ . gaussian1.sh
(1)  VALUES:  5 0.5
(2)  VALUES:  5 0.5
5.27391
$ 
$ . gaussian1.sh
(1)  VALUES:  5 0.5
(2)  VALUES:  5 0.5
5.56646
$ 
$ 

tyler_durden

Have put this but results are still wrong. Totally confused now.

  function rgaussian1() {
      a = rand()
      b = rand()
      c = rand()
      r1 = a + b + c
      return r1
  }

  function rgaussian11(mean, sigma) {
      print "VALUES: ",mean,sigma
      a = rgaussian1()
      r1 = mean + (a * sigma)
      return r1
  }

  BEGIN {
      srand()
  }

# Include gaussian distributed random numbers
  NF == 2 {
      a = 5
      b = 0.5
      r1 = rgaussian1()
      r2 = rgaussian11(a,b)
      print a,b
  }


VALUES:  0.60164 0.155695
1.92518 0.615652
VALUES:  0.0518825 0.208599
1.84557 0.612806
VALUES:  0.854148 0.27189
1.72343 0.0530989

By default variables in awk are global. The problem is that in the 'body' a is assigned 5, and from appearances the value of a should still be 5 when rgaussian11(a,b) is invoked. However, rgaussian1() sets a and thus it is not.

In awk, it is possible to declare a variable as local to a function. It is done by defining it as an argument, and convention is to separate the local variables with a tab to make it more obvious that the function is not expecting them to be passed in. For instance, your function should be defined like this:

function rgaussian1(            a, b, c, r1 )

This declares a,b,c and r1 as local variables. If this is done, the value of a in the 'body' of the programme is unchanged.

---------- Post updated at 17:16 ---------- Previous update was at 17:13 ----------

As an afterthought, you might declare your other function in a similar fashion:

 function rgaussian11(mean, sigma,         a, b )

Declaring both variables a and b as local to the function.

Or use vars with different names or skip the intermediate vars altogether, e.g.:

  function rgaussian1() {
    return rand() + rand() + rand()
  }

  function rgaussian11(mean, sigma) {
      print "VALUES: ",mean,sigma
      return mean + (rgaussian1() * sigma)
  }

That's what I want them to be, local to the function as you say. I thought that as I do not want to return their values, I would not include them as function parameters.

---------- Post updated at 04:30 PM ---------- Previous update was at 04:19 PM ----------

Thanks a lot. I did not know this detail of how to define local variables in functions, by declaring all local function variables in the formal declaration of the function.

What you return, using the 'return' statement has nothing to do with the values that are declared as parameters in the function header. Anything referenced in the header is considered local and changes to those variables does not affect the global copy of a variable with the same name.

Consider this small example:

#!/usr/bin/env ksh
awk '
        function demo( a, b,    c, d )       # the spacing suggests a and b are expected parameters, c and d are local variables
        {
                a += 1;               # do some work -- change the value of a
                c = a+b;
                d = a-b-1;
                e = a*b;
                printf( "in demo: a=%d  b=%d  c=%d d=%d e=%d\n", a, b, c, d, e );
        }

        BEGIN { 
                a = 2;
                b = 3;
                demo( a, b )
                printf( "in begin: a=%d  b=%d  c=%d d=%d e=%d\n", a, b, c, d, e );
        }
'

When executed, the output is this:

in demo: a=3  b=3  c=6 d=-1 e=9
in begin: a=2  b=3  c=0 d=0 e=9

The variable 'e' is not referenced in the function header and thus remains 9 after the call to the function demo(). Variable 'a', even though changed in the function, has the original value after the call, as do b, c and d.

One of the most common, and frustrating, mistakes when writing awk programmes is to invoke a function from inside a for loop using 'i' as the index variable only to have it changed unintentionally by the called function.

---------- Post updated at 17:37 ---------- Previous update was at 17:35 ----------

Oops -- looks like we crossed our posts, extra explanation never hurts though :slight_smile:

So one never passes the local variables to the function. Is that correct, we pass only the arguments we want to be passed.

---------- Post updated at 04:44 PM ---------- Previous update was at 04:40 PM ----------

That was a superb example agama. You said it all.

There is nothing that would stop a function from being called and where the 'local' variables are also passed, but it is likely that they will be ignored by the function. The demo() function could have been invoked with (a,b,1,2) but that would not have changed the result as the function did not expect the variables c and d to contain useful information and assigned values before using them in the printf().

Glad the example helped.

How does the function know c and d are local? From what I understand now, all variables passed to the function are local, even 'a' and 'b'. Depends whether we use 'a' and 'b' for some calculation. Does one actually need to put a tab to recognize whether variables are local or not?

You are correct, all variables that appear in the function declaration are local. The tab, or extra white space, is only to help humans who come along and maintain your programme. It's an easy signal that everything to the right of the extra space is just a local variable and that the function is not expecting to have those values passed in.

I've been using awk for the last 15 years or so, and I still get tripped up now and again because I forget to put a local variable in the header. Not one of awk's easier concepts to explain or understand; you've done well with it.

Thanks. :b: