Example:
$ cat file1
2
3
$ cat file2
1
2
3
4
5
6
The following awk script works like a charm, NR==FNR
is true for file1, the remainder runs for file2:
awk '
NR==FNR {A[$1]; next}
($1 in A)
' file1 file2
2
3
Now have an empty file1:
>file1
and run the awk script again.
The result is empty as expected.
However, this time it did the NR==FNR
action for file2!
Check with
awk '
NR==FNR {A[$1]; print FILENAME,$1; next}
($1 in A)
' file1 file2
Here the result was good - by good luck.
In some other cases this can lead to misbehavior.
The following fix is available:
awk '
FILENAME=="file1" {A[$1]; print FILENAME,$1; next}
($1 in A)
' file1 file2
But not always applicable, for example if you have a wild card file*
.
So here is a better fix:
awk '
F==0 {A[$1]; print FILENAME,$1; next}
($1 in A) {print}
' file1 F=1 file2
Now F is undefined=0 in file1 and set to 1 before file2 is opened.
You can even continue like this: file1 F=1 file2 F=2 file3
then you can even distinguish between file2 and file3.
The ultra-short-code-hackers can even use !F
.
3 Likes
Another option would be to just check the filename:
awk '
BEGIN {f=FILENAME}
FILENAME==f {A[$1]; f=FILENAME; next}
($1 in A)
' file1 file2
The ultra-short-code-hackers can even use:
awk '
FILENAME==f {A[$1]; f=FILENAME; next}
($1 in A)
' f=file1 file1 file2
Hmm, what is the f=FILENAME
in the main loop for?
Then, in your first example, the BEGIN {f=FILENAME}
only works with nawk and derived awk's.
I get the impression that's why that feature exists, so you can process different files with their own default values of some sort.
Sure.
Most useful is FS, like file1 FS="," file2
Then one can as well test with FS!=","
You can also let awk
directly examine the arguments given to it:
awk '
BEGIN { printf("ARGV[0]=%s\n", ARGV[0])
for(i = 1; i < ARGC; i++)
if(ARGV ~ /=/)
printf("ARGV[%d]=%s: assignment\n", i, ARGV)
else { printf("ARGV[%d]=%s: file operand\n", i, ARGV)
if(!f1) f1 = ARGV
}
print ""
}
FILENAME == f1 {
# Process lines from 1st file here...
printf("From 1st file(%s); %s\n", f1, $0)
next
}
{ # Process remaining files here...
printf("From subsequent file(%s): %s\n", FILENAME, $0)
}' FS=, empty_file OFS='|' file1 FS='|' file2
If empty_file
is an empty file, file1
contains:
f1 line1
f1 line2
and file2
contains:
f2 line1
f2 line2
it produces the output:
ARGV[0]=awk
ARGV[1]=FS=,: assignment
ARGV[2]=empty_file: file operand
ARGV[3]=OFS=|: assignment
ARGV[4]=file1: file operand
ARGV[5]=FS=|: assignment
ARGV[6]=file2: file operand
From subsequent file(file1): f1 line1
From subsequent file(file1): f1 line2
From subsequent file(file2): f2 line1
From subsequent file(file2): f2 line2
and if the last line of the script is changed to:
}' FS=, OFS='|' file1 FS='|' file2
it produces the output:
ARGV[0]=awk
ARGV[1]=FS=,: assignment
ARGV[2]=OFS=|: assignment
ARGV[3]=file1: file operand
ARGV[4]=FS=|: assignment
ARGV[5]=file2: file operand
From 1st file(file1); f1 line1
From 1st file(file1); f1 line2
From subsequent file(file2): f2 line1
From subsequent file(file2): f2 line2
1 Like
This might work on some systems, but the standards say that the value of FILENAME
in a BEGIN clause is undefined.
In awk
on OS X, FILENAME
expands to an empty string (or 0 depending on context) in a BEGIN action.
Thanks Don, I though that ARGV is GNUmagic.
So my first example should be improved like this
awk '
BEGIN {
for (i=1; i<ARGC; i++) if (ARGV!~"=") {f1=ARGV; break}
}
FILENAME==f1 {A[$1]; next}
($1 in A)
' file1 file2
And works nicely with shell wildcards like file*
!
BTW most awk versions want if (... ~ "=")
instead of if (... ~ /=/)
, even outside a {block}
.
They have a problem to parse the characters (
)
=
within / /
but not within " "
.
1 Like
Thanks for the warning.
The standards say that the right hand operand of the ~
and !~
operators can always be a string containing an ERE or an ERE token (i.e., /ERE/
). But, if there is an ambiguity as to whether a /
is a division operator or part of an ERE token, awk
is supposed to assume it is a division operator. In a simple if
statement like this, there shouldn't be any ambiguity.