Is that the way that the END block should be used? I am confused; I thought that the Begin block went at the beginning of the scripts. Any one have an example of how BEGIN could be used here? Or does END only apply? I thought BEGIN was when the input was being read.
No. Begin and End blocks in awk are meant to be used before and after all lines are processes; repectively, and appear as follow in awk script..
BEGIN{do something}
more awk script
END{do something}
I suggest you call gawk in this fashion until you are comfortable with awk/gawk...
gawk -f awk.script.txt file.to.parse.txt
awk is arguably a 4GL language since every line of the file to be parsed is passed by each line of the awk script, with the exception of awk script in beginning and end sections. Again, Begin is processed before every line and END after the processing of every line. The Begin section is useful for printing headers or initializing counters. The End section is useful for performing end calculations, etc.
gawk '{count[$2]++; keyword[$2] = $1}
END {
# look at every count we have gotten. k will be order
for (k in count)
if (count[k] == 3) keyword[k] = "order this"
else print keyword[k] " " k
}
}' << orderfile
Or would BEGIN be there? The script I have shows END in the beginning.
As this script is not mine, I am curious as to why this would be included in the END part
END {
for (k in count)
if (count[k] == 3) keyword[k] = "order this"
else print keyword[k] " " k = "order this"
I don't understand the rules for what you put in this END/BEGIN block or why. I always believed it was invoked when you were printing headers or footers, but it appears from the above that is only part of the use. Does someone have any example? I have searched but there are many and I am not sure about this rule.
awk takes a set of files as arguments, which altogether present a stream of lines to it. Line after line is processed, with the internal FILENAME variable changing if need be (if switching to the next file).
Before any of the lines of the stream is read, ALL of the BEGIN actions are being processed. You can use this to initialize variables, print headers, what have you.
After the last line of the entire stream, possibly consisting of many a file, ALL the END actions will be processed, e.g. for printing totals. That means, you can calculate your count[$2]++ during normal processing, and at the END, print out (in whatever structure) what you have computed so far.
And it looks to me as if that's exactly what that code snippet you presented is doing...
Begin and End blocks are optional. In this case, END is not at the beginning, it is at the end. The thing that is probably confusing you is because of the manner this is being run. The awk script is being specified inline (as opposed to using the -f switch to specify an awk script file).
Here is the start of the awk code
{count[$2]++; keyword[$2] = $1}
The end follows that. So there is no BEGIN section. This means every line of the input file will be processed against that line.
This may help...
At this point, I suggest reading awk and looking at examples.
Basic ways to call awk...