awk Behavior

Linux Release

Uname details

Data file

Ive been at the command line for some time. Back as far as SCO and Interactive Unix. I have always used this construct without issues. I want to isolate the ip / field 1. As you can see .. the first line is "skipped".

This works as expected. But again, whats changed ?

Thanks !

A BEGIN rule is executed once only, before the first input record is read. This is the reason why below code works as expected:-

awk 'BEGIN { FS = "," };{print $1}' dafile

But in this code, FS is set only when the first input record is read:-

awk '{FS="," }{print $1}' dafile

I would change the statement shown above in red to:

Other ways to make sure that the FS you want is used to split every input line include:

awk -F',' '{print $1}' dafile
awk '{print $1}' FS=',' dafile
2 Likes

Thank you Don. I checked gawk code in field.c - routines for dealing with fields and record parsing.

So record parsing happens first with default field separator, then new field separator is used to parse subsequent records.

I also noticed that function set_NF is called before record parsing. So gawk behavior for this variable is different.

awk -F, '{NF=1}{print $NF}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13

Any idea why developers didn't do the same with function set_FS

Old awk and nawk appear inconsistent:

nawk '{print $1; FS=","; print $1}' dafile
10.10.10.10,house
10.10.10.10,house
10.10.10.11
10.10.10.11
10.10.10.12
10.10.10.12
10.10.10.13
10.10.10.13
nawk '{FS=","; print $1}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13

It looks like they have a "late field splitting" that occurs when a field is referenced the first time.

Even though this discussion about awk intrinsics is fascinating and my horizon was expanded (a collective "thank you" to you all in this thread), just for the record:

Wouldn't the usage of shell means (variable expansion or field splitting) be less costly than the use of an external program? I suppose thread-o/p does something with the values once he split them, something along the lines of:

awk -F',' '{print $1}' datafile | while read IP ; do ..... done

In such a case it might be easier to do:

while IFS=, read IP junk ; do ..... done < datafile

or, depending on what else is done:

while read LINE ; do
     IP="${LINE%,*}"
     .....
done < datafile

bakunin

I have not looked at the gawk code (and for legal reasons choose not to do so). But one might guess that a function named set_NF() would set the value of the awk NF variable. Are you really telling me that gawk sets the value of NF for a new input record BEFORE parsing that record into fields??? That makes absolutely no sense to me! How can it set NF before it parses a record into fields to determine what value should be assigned to NF ? One might expect that a function like that would be called to parse an input line or AFTER parsing an input line depending on the context. In the context of reading a new record from an input file at the start of a new cycle and in the context of using the awk command:

getline

with no argument naming a variable to be assigned and with no input redirection that should happen (as well as setting $x (for 0 <= x <= NF ), NR , and FNR ). In the context of reading a new record from an input file using the awk command:

getline variable

with a variable, but no input redirection, NR and FNR should be updated, but NF and the current record's fields should not be modified. In the context of reading a new record from an input file using the awk command:

getline variable < file
        or
command | getline variable

with a variable and with input redirection, none of the variables NF , NR , FNR , nor the current record's fields should change.