Regarding change in column numbers after some commands

Hi All,

I was using some commands to:

  1. replace a column by a constant string character
    text awk -v a=CA 'NF>1{ $3=a; print; } ' $line>$line"_1"
  2. to copy a column and paste it in another place
    text awk '$5=$2" "$5' $line>$line"_2"
  3. to delete the extra columns
    text awk '{for(i=1;i<=NF;i++) line=(i==1)?$i:((i==6)?line:line OFS $i);print line}' $line>$line"_3"

But by performing these sequence of functions, the columns of file format is changed which is not required.

Since, I need to use these files as input for running another code, and it is giving me the error just because the column numbers is changed.
Can anyone help me in doing these functions without changing the columns format.

If you show input and expected output it would be better to help you.

Initially, The file was like:

ATOM      1  N   LYS     1       3.440  10.397  -1.989  1.00 18.23
ATOM      2  CA  LYS     1       3.897  11.093  -3.203  1.00 17.09
ATOM      3  C   LYS     1       4.920  10.272  -3.962  1.00 16.14
ATOM      4  O   LYS     1       5.472   9.269  -3.488  1.00 15.98

after applying

awk -v a=CA 'NF>1{ $3=a; print; } ' $line>$line"_1"

for replacing N, C and O with CA the file was like:

ATOM 1 CA LYS 1 3.440 10.397 -1.989 1.00 18.23
ATOM 2 CA LYS 1 3.897 11.093 -3.203 1.00 17.09
ATOM 3 CA LYS 1 4.920 10.272 -3.962 1.00 16.14
ATOM 4 CA LYS 1 5.472 9.269 -3.488 1.00 15.98

Then, after applying

awk '$5=$2" "$5' $line>$line"_2"

for copy 2nd column and paste as a 5th column,we get:

ATOM 1 CA LYS 1 1 3.440 10.397 -1.989 1.00 18.23
ATOM 2 CA LYS 2 1 3.897 11.093 -3.203 1.00 17.09
ATOM 3 CA LYS 3 1 4.920 10.272 -3.962 1.00 16.14
ATOM 4 CA LYS 4 1 5.472 9.269 -3.488 1.00 15.98

Further, I applied

awk '{for(i=1;i<=NF;i++) line=(i==1)?$i:((i==6)?line:line OFS $i);print line}' $line>$line"_3"

for deleting the extra 6th column, and the output is like:

ATOM 1 CA LYS 1 3.440 10.397 -1.989 1.00 18.23
ATOM 2 CA LYS 2 3.897 11.093 -3.203 1.00 17.09
ATOM 3 CA LYS 3 4.920 10.272 -3.962 1.00 16.14
ATOM 4 CA LYS 4 5.472 9.269 -3.488 1.00 15.98

Actually, after all this modifications in my file, I need it to be like:

ATOM      1  CA   LYS     1       3.440  10.397  -1.989  1.00 18.23
ATOM      2  CA   LYS     2       3.897  11.093  -3.203  1.00 17.09
ATOM      3  CA   LYS     3       4.920  10.272  -3.962  1.00 16.14
ATOM      4  CA   LYS     4       5.472   9.269  -3.488  1.00 15.98

Please have a look at the column numbers and number of spaces between columns. After posting this reply, the number of spaces between the columns are showing same, but this is not the scene the blank space between ATOM and 1 is of 6 blank space in original file whereas it is only 1 after modifications.

That's because awk treats any number of spaces as field separator (unless told otherwise). As you can see, the formatting has already gone in the first action's output. If you need exactly formatted output, use printf for all the fields.

1 Like

Hi RudiC,

Thanks for your reply.
I actually dont know the use of printf in shell script. can you plz giv e me a format by which I can convert my awk commands in printf form

man printf

In your case, start with printf "%-9s%-3s\n", $1, $2 and then expand.

I tried this one:

gawk '{print $1,"     ", $2," ",$3," ",$4,"  ",$5,"     ",$6,"",$7,"",$8,"",$9,"",$10}' $line>new

After this I am getting:

ATOM       1   CA   LYS    1       4.816  8.341  0.644  1.00  0.18
ATOM       2   CA   LYS    2       4.790  8.614  -0.804  1.00  0.00
ATOM       3   CA   LYS    3       4.989  10.100  -1.068  1.00  0.83
ATOM       4   CA   LYS    4       3.902  10.937  -0.409  1.00  1.35
ATOM       5   CA   LYS    5       4.120  12.421  -0.673  1.00  1.50
ATOM       6   CA   LYS    6       3.037  13.264  -0.007  1.00  1.72
ATOM       7   CA   LYS    7       3.250  14.696  -0.268  1.00  1.92
ATOM       8   CA   LYS    8       5.891  7.843  -1.514  1.00  0.05
ATOM       9   CA   LYS    9       6.558  6.986  -0.925  1.00  0.17
ATOM       10   CA   SER    10       6.078  8.168  -2.782  1.00  0.03
ATOM       11   CA   SER    11       7.151  7.551  -3.567  1.00  0.00
ATOM       12   CA   SER    12       6.995  7.953  -5.030  1.00  0.15
ATOM       13   CA   SER    13       7.629  9.216  -5.194  1.00  0.08
ATOM       14   CA   SER    14       8.489  8.067  -3.059  1.00  0.07

But now, I need to right align the columns. For which I again tried printf command, which is not working for entire file columns and for multiple files in script.
Can you plz help me in finding a way to right justify the columns for multiple files.

I'm sorry, I can't help if you don't read and heed my post.

Hi,

RudiC Thanks for ur help....I got it.....

I followed the following command using printf and get the correct output:

awk '{printf ("%4s%7s%4s%5s%6s%12s%8s%8s%6s%6s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10);}' $line

:):b:

Very good, congrats!
You may want to have string fields left justified; then, use a minus sign: "%-4s".

1 Like