Undesired removal of white space with awk

calbrex · June 20, 2012, 2:24pm

Hi,

I'm fairly new to scripting and I have a problem that I am having difficulty solving.

What I'd like to do is run an awk script to adjust the string in the first field depending on the string in another field. This is best explained with an example:

Here is my script:

cat find_replace.awk
BEGIN{}
{
    if ($1 == "01" && $6 == "55"){
    sub("01","07",$1);{print $0}
    }
    else if ($1 == "02" && $6 == "55"){
    sub("02","17",$1);{print $0}
    }
    else if ($1 == "31" && $6 == "55"){
    sub("31","67",$1);{print $0}
    }
    else if ($1 == "36" && $6 == "55"){
    sub("36","67",$1);{print $0}
    }
    else {print $0}
}

For each line, the above script reads the string in the first and sixth fields. Should the first field contain 01, 02, 31 or 36 and the sixth field contain 55, then the number in the first field is substituted and the line printed. Otherwise, the line is printed unaltered.

Here is my input file:

 cat test_input
01 ABCDEFGH AB  AB 1234       04   1  12
01 AAAAAAAA AA  AA 1111       55   1  11
02 AAAAAAAA AA  AA 1111       55   1  11
31 AAAAAAAA AA  AA 1111       55   1  11
36 AAAAAAAA AA  AA 1111       55   1  11
36 AAAAAAAA AA  AA 1111       77   1  11
94 AAAAAAAA AA  AA 1111       63   1  11

and here is the actual output:

 gawk -f find_replace.awk test_input
01 ABCDEFGH AB  AB 1234       04   1  12
07 AAAAAAAA AA AA 1111 55 1 11
17 AAAAAAAA AA AA 1111 55 1 11
67 AAAAAAAA AA AA 1111 55 1 11
67 AAAAAAAA AA AA 1111 55 1 11
36 AAAAAAAA AA  AA 1111       77   1  11
94 AAAAAAAA AA  AA 1111       63   1  11

For reasons that I do not understand, the sub commands have stripped the white space between the fields and replaced it with a single space, despite using a print $0 statement. Is there another command that I can use instead of sub that will preserve the white space?

The output that I am looking for is:

 
01 ABCDEFGH AB  AB 1234       04   1  12
07 AAAAAAAA AA  AA 1111       55   1  11
17 AAAAAAAA AA  AA 1111       55   1  11
67 AAAAAAAA AA  AA 1111       55   1  11
67 AAAAAAAA AA  AA 1111       55   1  11
36 AAAAAAAA AA  AA 1111       77   1  11
94 AAAAAAAA AA  AA 1111       63   1  11

where the correct number of spaces are preserved between the fields.

I'm sorry if this problem has been addressed before - I did a brief search, but I couldn't find a solution. Please feel free to direct me to a solution if this has been addressed before. Also, if possible, I'd prefer to keep the script in a file that can be called using awk -f scriptname.

Franklin52 · June 20, 2012, 2:42pm

You can do something like this:

BEGIN{}
{
    if ($1 == "01" && $6 == "55"){
    s = $0
    sub("^01","07",s);{print s}
    }
    else if ($1 == "02" && $6 == "55"){
    s = $0
    sub("^02","17",s);{print s}
    }
    else if ($1 == "31" && $6 == "55"){
    s = $0
    sub("^31","67",s);{print s}
    }
    else if ($1 == "36" && $6 == "55"){
    s = $0
    sub("^36","67",s);{print s}
    }
    else {print $0}
}

in2nix4life · June 20, 2012, 2:47pm

Or if you like long one-liners ;-), this works:

awk '$1 ~ /[01|02|31|36]/ && $6 ~ /55/{sub("01","07");sub("02","17");sub("31","67");sub("36","67");print}' file

07 AAAAAAAA AA  AA 1111       55   1  11
17 AAAAAAAA AA  AA 1111       55   1  11
67 AAAAAAAA AA  AA 1111       55   1  11
67 AAAAAAAA AA  AA 1111       55   1  11

calbrex · June 20, 2012, 2:59pm

Thank you so much Franklin52 and in2nix4life!I can't believe that the solution was so straight forward.

I prefer to have a script file so that I can send it to others to use, although the one liner is also useful. I'm guessing that I could make the one liner even longer by getting it to print the lines that do not have replaced strings like the script in my opening post.

Once again, thanks for your responses.

Scrutinizer · June 20, 2012, 3:34pm

awk '$6==55{sub("^01 ","07 "); sub("^02 ","17 "); sub(/^3[16] /,"67 ")}1' infile

--

That is equivalent to

awk '$1 ~ /[01236|]/ && [..]

This may be what you were after:

awk '$1 ~ /^(01|02|31|36)$/