(g)awk conditional substitution issues when attempting to delete character

A portion of my input is as follows:

 1087  IKON01,49 A WA-                                 -1 .       -1 .           0 W               WA-                                 -1 .        -1 .         0 .     -1 .        -1 -1 -1 -1 -1 -1 W
 1088  IKON01,49 A J.@QU80MW.                           2 !J.@!    0 .           0 QWM[            QUM                                  7 [W.       0 .         0 .      0 .         0 11  3  3  2 -1 JQMW
 1089  IKON01,49 A K.@L&                               -1 .       -1 .          -6 KL/             K.@L                                -1 .         1 /         0 .      0 .        -1 -1 -1  1  2  1 KL

I would like the following desired output:

 1087  IKON01,49 A WA-                                 -1 .       -1 .           0 W               WA-                                 -1 .        -1 .         0 .     -1 .        -1 -1 -1 -1 -1 -1 W
 1088  IKON01,49 A J.@QU80MW.                           2 !J.@!    0 .           0 QM[             QUM                                  7 [W.       0 .         0 .      0 .         0 11  3  3  2 -1 JQMW
 1089  IKON01,49 A K.@L&                               -1 .       -1 .          -6 KL/             K.@L                                -1 .         1 /         0 .      0 .        -1 -1 -1  1  2  1 KL

In essence, I would like to delete every W in field $9 while preserving the original, pre-substitution formatting, given the following regex condition:

if($9 ~/^.W[^H]=*\[$/)

However, I would want the formatting of the file to be preserved. I realize this has been dealt with in previous posts and I know how to use

printf

and/or

FIELDWIDTH

(with gawk), but since my file is 61 fields long (NF==61; I've only presented a portion here), this is tremendously cumbersome and messy. In addition, I do not know every fieldwidth and so would like to avoid figuring this out to reformat the file.

I've had a similar issue in the past, and RudiC helped me via a very nifty trick taking advantage of NF being recomputed when there is an assignment to $0. Thus, I attempted the following:

gawk '$9 ~/^.W[^H]=*\[$/{X=$9; sub(/W/,"",X); sub ($9, X, $0)}' file.txt

This time however, it seems as those when doing the field substitution, the operation is aborted because of the non-escaped meta-character "[" that is in my data. This produces the following error in field 9 of a different line:

fatal: Invalid regular expression: /BW>[/

In light of this, I've also attempted:

gawk '{if($9 ~/^.W[^H]=*\[$/); sub(/W/,"",$9); print}' file.txt

Not only does this ruin the formatting of the file, but it is also matching lines I wouldn't expect it to such as:

1  IKON01,01 A W:-                                 -1 .       -1 .           0 W               W:-                                 -1 .        -1 .         0 .     -1 .        -1 -1 -1 -1 -1 -1 W

Thank you so much in advance for helping me through this quagmire.

---------- Post updated at 11:00 AM ---------- Previous update was at 10:49 AM ----------

This is probably fairly obvious, but I should say that in my data examples from my post, the numbers on the far left are line numbers and not $1.

Thank you.

This seems to work for me:

awk '$9 ~ "^.W[^H]=*\[$" {X=$9; sub(/W/, "", X); sub ($9 "[]", X " ", $0)} 1' file
IKON01,49 A WA-                                 -1 .       -1 .           0 W               WA-                                 -1 .        -1 .         0 .     -1 .        -1 -1 -1 -1 -1 -1 W
IKON01,49 A J.@QU80MW.                           2 !J.@!    0 .           0 QM[             QUM                                  7 [W.       0 .         0 .      0 .         0 11  3  3  2 -1 JQMW
IKON01,49 A K.@L&                               -1 .       -1 .          -6 KL/             K.@L                                -1 .         1 /         0 .      0 .        -1 -1 -1  1  2  1 KL

We're making the field ending "[" part of a "bracket expression" (c.f. man regex ) by treating itself as the opening bracket, adding the char (the "[") and the closing bracket as char constants in the second sub statement. We need to add a space when sub stituting $9 to maintain the filed length and thus the $0 formatting.

2 Likes

Thanks so much for this RudiC.

With your current line of code, the conditional, viz.,

$9 ~ "^.W[^H]=*\[$"

threw up the error code

gawk: cmd. line:1: warning: escape sequence `\[' treated as plain `['
gawk: cmd. line:1: (FILENAME=Kings.qdf FNR=1) fatal: Unmatched [, [^, [:, [., or [=: /^.W[^H]=*[$/

When I changed it to a string constant, i.e.,

$9 ~ /^.W[^H]=*\[$/ 

it did the trick!

I cannot thank you enough for this and look forward to studying this to see how you got it to work.