Begin/End blocks in awk: confused

newbie2010 · January 6, 2014, 3:15pm

I am trying to understand how to use the END block in awk without much success. I have this script that I found:

gawk '{count[$2]++; keyword[$2] = $1}
if (count[k] == 3) keyword[k] = "order this"
else print keyword[k] " " k
}
}' << orderfile

Is that the way that the END block should be used? I am confused; I thought that the Begin block went at the beginning of the scripts. Any one have an example of how BEGIN could be used here? Or does END only apply? I thought BEGIN was when the input was being read.

blackrageous · January 6, 2014, 3:48pm

No. Begin and End blocks in awk are meant to be used before and after all lines are processes; repectively, and appear as follow in awk script..

BEGIN{do something}
more awk script
END{do something}

I suggest you call gawk in this fashion until you are comfortable with awk/gawk...

gawk -f awk.script.txt  file.to.parse.txt

awk is arguably a 4GL language since every line of the file to be parsed is passed by each line of the awk script, with the exception of awk script in beginning and end sections. Again, Begin is processed before every line and END after the processing of every line. The Begin section is useful for printing headers or initializing counters. The End section is useful for performing end calculations, etc.

T

newbie2010 · January 6, 2014, 4:16pm

So would you use BEGIN/END like this?

 gawk '{count[$2]++; keyword[$2] = $1}
 END {
       # look at every count we have gotten.  k will be order
       for (k in count)
    if (count[k] == 3) keyword[k] = "order this"
else print keyword[k] " " k
}
}' << orderfile

Or would BEGIN be there? The script I have shows END in the beginning.

RudiC · January 6, 2014, 4:24pm

Well, you can have several of each; following would be perfectly legitimate:

awk 'END {print "4"} BEGIN {print "3"} END {print "7"} BEGIN {print "6"}'

And, what you propose above is fine, although be aware that the order in which (x in Y) supplies the xs is undefined (awk feature).

newbie2010 · January 6, 2014, 4:36pm

As this script is not mine, I am curious as to why this would be included in the END part

END {
for (k in count)
    if (count[k] == 3) keyword[k] = "order this"
else print keyword[k] " " k = "order this"

I don't understand the rules for what you put in this END/BEGIN block or why. I always believed it was invoked when you were printing headers or footers, but it appears from the above that is only part of the use. Does someone have any example? I have searched but there are many and I am not sure about this rule.

RudiC · January 6, 2014, 4:49pm

awk takes a set of files as arguments, which altogether present a stream of lines to it. Line after line is processed, with the internal FILENAME variable changing if need be (if switching to the next file).
Before any of the lines of the stream is read, ALL of the BEGIN actions are being processed. You can use this to initialize variables, print headers, what have you.
After the last line of the entire stream, possibly consisting of many a file, ALL the END actions will be processed, e.g. for printing totals. That means, you can calculate your count[$2]++ during normal processing, and at the END, print out (in whatever structure) what you have computed so far.
And it looks to me as if that's exactly what that code snippet you presented is doing...

blackrageous · January 6, 2014, 5:31pm

Begin and End blocks are optional. In this case, END is not at the beginning, it is at the end. The thing that is probably confusing you is because of the manner this is being run. The awk script is being specified inline (as opposed to using the -f switch to specify an awk script file).

Here is the start of the awk code

{count[$2]++; keyword[$2] = $1}

The end follows that. So there is no BEGIN section. This means every line of the input file will be processed against that line.
This may help...

At this point, I suggest reading awk and looking at examples.
Basic ways to call awk...

awk 'instructions' file.to.parse.txt
awk -f awkscript.awk file.to.parse.txt