Difficulties in matching left bracket as literal in awk

I need to work with records having #AX in the EXP1 , please see my data sample and my attempt below:

$ cat xx
08:30:33 KEY1 (1255) EXP1 [#AX0010X001] VAL:20AX0030006
08:30:33 KEY1 (1255) EXP1 [#0200000001] VAL:20AX0030006
08:30:33 KEY1 (1255) EXP1 [#AX0020X002] VAL:20AW0030006
08:30:33 KEY1 (1255) EXP1 [#A02210X001] VAL:20AW0030006
$ gawk '{ if($0 ~ "\[#AX") print; }' xx
gawk: cmd. line:1: warning: escape sequence `\[' treated as plain `['
gawk: cmd. line:1: (FILENAME=xx FNR=1) fatal: Unmatched [ or [^: /[#AX/

I tied other way specify left bracket an it worked:

gawk '{ if($0 ~ "[[]#AX") print; }' xx
08:30:33 KEY1 (1255) EXP1 [#AX0010X001] VAL:20AX0030006
08:30:33 KEY1 (1255) EXP1 [#AX0020X002] VAL:20AW0030006

I tried awk, nawk and gawk, same results, although gawk is more helpful in spelling out which file and line caused the problem when I used '\[' escape sequence.

My question is why awk dies? and why gawk shows this warning about '\[' treated as plain '['?

Thanks in advance.

This is because you are passing an RE via a string, the string is parsed first and one level of escape characters are stripped. Then the result is processed by the RE parser and another level of escape characters are removed.

Imagine looking for a double quote character - these are not special to the RE so don't need to be escaped there and you could do:

gawk '{ if($0 ~ "\"AX") print; }' xx
..or..
gawk '{ if($0 ~ /"AX/) print; }' xx

But as [ means something to the RE parser it must arrive there escaped - i.e. double escaped in the string (or single as an RE):

gawk '{ if($0 ~ "\\[AX") print; }' xx
..or..
gawk '{ if($0 ~ /\[AX/) print; }' xx
3 Likes