Taking largest (negative) number from column of coordinates and adding positive form to every other

Hello all,

I'm new to the forums and hope to be able to contribute something useful in the future; however I must admit that what has prompted me to join is the fact that currently I need help with something that has me at the end of my tether.

I have a PDB (Protein Data Bank) file which I have condensed to the following for ease of modification:

HETATM    1  C   UNK     0     -25.639   7.865   8.470  0.00  0.00           C+0
HETATM    2  C   UNK     0     -24.868   8.448   7.265  0.00  0.00           C+0
HETATM    3  C   UNK     0     -23.332   8.257   7.385  0.00  0.00           C+0
HETATM    4  C   UNK     0     -22.562   8.844   6.174  0.00  0.00           C+0
HETATM    5  C   UNK     0     -21.028   8.648   6.301  0.00  0.00           C+0
HETATM    6  C   UNK     0     -20.246   9.232   5.094  0.00  0.00           C+0
HETATM    7  C   UNK     0     -18.714   9.031   5.231  0.00  0.00           C+0
HETATM    8  C   UNK     0     -17.931   9.614   4.026  0.00  0.00           C+0
HETATM    9  C   UNK     0     -16.400   9.414   4.163  0.00  0.00           C+0
HETATM   10  C   UNK     0     -15.609   9.993   2.963  0.00  0.00           C+0
HETATM   11  C   UNK     0     -14.078   9.783   3.118  0.00  0.00           C+0
HETATM   12  C   UNK     0     -13.276  10.358   1.924  0.00  0.00           C+0
HETATM   13  C   UNK     0     -11.741  10.165   2.048  0.00  0.00           C+0
HETATM   14  C   UNK     0     -10.997  10.761   0.823  0.00  0.00           C+0
HETATM   15  C   UNK     0      -9.453  10.598   0.887  0.00  0.00           C+0
HETATM   16  C   UNK     0      -8.744  11.204  -0.349  0.00  0.00           C+0
HETATM   17  O   UNK     0      -7.420  11.028  -0.226  0.00  0.00           O+0
HETATM   18  P   UNK     0      -6.354  11.451  -1.177  0.00  0.00           P+0
HETATM   19  O   UNK     0      -5.034  11.048  -0.629  0.00  0.00           O+0
HETATM   20  O   UNK     0      -6.380  12.923  -1.348  0.00  0.00           O+0
HETATM   21  O   UNK     0      -6.556  10.795  -2.491  0.00  0.00           O+0
HETATM   22  H   UNK     0      -9.116  10.704  -1.250  0.00  0.00           H+0
HETATM   23  H   UNK     0      -8.986  12.271  -0.408  0.00  0.00           H+0
HETATM   24  H   UNK     0      -9.199   9.536   0.949  0.00  0.00           H+0
HETATM   25  H   UNK     0      -9.070  11.093   1.785  0.00  0.00           H+0
HETATM   26  H   UNK     0     -11.237  11.827   0.752  0.00  0.00           H+0
HETATM   27  H   UNK     0     -11.366  10.270  -0.084  0.00  0.00           H+0
HETATM   28  H   UNK     0     -11.381  10.656   2.956  0.00  0.00           H+0
HETATM   29  H   UNK     0     -11.510   9.098   2.119  0.00  0.00           H+0
HETATM   30  H   UNK     0     -13.488  11.427   1.843  0.00  0.00           H+0
HETATM   31  H   UNK     0     -13.616   9.871   1.007  0.00  0.00           H+0
HETATM   32  H   UNK     0     -13.745  10.270   4.038  0.00  0.00           H+0
HETATM   33  H   UNK     0     -13.874   8.713   3.202  0.00  0.00           H+0
HETATM   34  H   UNK     0     -15.820  11.063   2.880  0.00  0.00           H+0
HETATM   35  H   UNK     0     -15.949   9.507   2.044  0.00  0.00           H+0
HETATM   36  H   UNK     0     -16.190   8.344   4.245  0.00  0.00           H+0
HETATM   37  H   UNK     0     -16.061   9.901   5.082  0.00  0.00           H+0
HETATM   38  H   UNK     0     -18.146  10.684   3.947  0.00  0.00           H+0
HETATM   39  H   UNK     0     -18.275   9.125   3.110  0.00  0.00           H+0
HETATM   40  H   UNK     0     -18.372   9.519   6.150  0.00  0.00           H+0
HETATM   41  H   UNK     0     -18.501   7.961   5.313  0.00  0.00           H+0
HETATM   42  H   UNK     0     -20.460  10.302   5.014  0.00  0.00           H+0
HETATM   43  H   UNK     0     -20.589   8.743   4.177  0.00  0.00           H+0
HETATM   44  H   UNK     0     -20.681   9.137   7.216  0.00  0.00           H+0
HETATM   45  H   UNK     0     -20.810   7.579   6.379  0.00  0.00           H+0
HETATM   46  H   UNK     0     -22.909   8.355   5.259  0.00  0.00           H+0
HETATM   47  H   UNK     0     -22.780   9.912   6.096  0.00  0.00           H+0
HETATM   48  H   UNK     0     -23.110   7.188   7.462  0.00  0.00           H+0
HETATM   49  H   UNK     0     -22.982   8.746   8.299  0.00  0.00           H+0
HETATM   50  H   UNK     0     -25.095   9.516   7.190  0.00  0.00           H+0
HETATM   51  H   UNK     0     -25.224   7.959   6.354  0.00  0.00           H+0
HETATM   52  H   UNK     0     -26.712   8.024   8.339  0.00  0.00           H+0
HETATM   53  H   UNK     0     -25.456   6.791   8.555  0.00  0.00           H+0
HETATM   54  H   UNK     0     -25.326   8.355   9.395  0.00  0.00           H+0

What I want to do is take the largest negative number from each of the three columns after the column containing nothing but '0' values (-26.712 in the first column, for example), and add the positive value of those single numbers to every single value in their respective columns. This is because I require that there be no negative numbers in the output.

I've tried all sorts of combinations of (g)awk, sed, grep in Bash, and various Python scripts (which I think is probably a more suitable language for this sort of task) but nothing has done it.

I'm still a relative newbie so am probably being ignorant about something obvious; please bear with me. Any help would be greatly appreciated.

1 Like

Welcome to the forums and thanks for using the code tags.

Could you take say the first 4 rows, a provide a desired output, please.

[s]That always seems to be the first column of that group in your data, though I presume it's not always so.

But that's why it always becomes zero here:

$ awk '{ MIN=0;
         for(N=6; N<=8; N++) if($N < MIN) MIN=$N
         for(N=6; N<=8; N++) $N -= MIN }' < data
HETATM 1 C UNK 0 0 33.504 34.109 0.00 0.00 C+0
HETATM 2 C UNK 0 0 33.316 32.133 0.00 0.00 C+0
HETATM 3 C UNK 0 0 31.589 30.717 0.00 0.00 C+0
HETATM 4 C UNK 0 0 31.406 28.736 0.00 0.00 C+0
HETATM 5 C UNK 0 0 29.676 27.329 0.00 0.00 C+0
HETATM 6 C UNK 0 0 29.478 25.34 0.00 0.00 C+0
HETATM 7 C UNK 0 0 27.745 23.945 0.00 0.00 C+0
HETATM 8 C UNK 0 0 27.545 21.957 0.00 0.00 C+0
HETATM 9 C UNK 0 0 25.814 20.563 0.00 0.00 C+0
HETATM 10 C UNK 0 0 25.602 18.572 0.00 0.00 C+0
HETATM 11 C UNK 0 0 23.861 17.196 0.00 0.00 C+0
HETATM 12 C UNK 0 0 23.634 15.2 0.00 0.00 C+0
HETATM 13 C UNK 0 0 21.906 13.789 0.00 0.00 C+0
HETATM 14 C UNK 0 0 21.758 11.82 0.00 0.00 C+0
HETATM 15 C UNK 0 0 20.051 10.34 0.00 0.00 C+0
HETATM 16 C UNK 0 0 19.948 8.395 0.00 0.00 C+0
HETATM 17 O UNK 0 0 18.448 7.194 0.00 0.00 O+0
HETATM 18 P UNK 0 0 17.805 5.177 0.00 0.00 P+0
HETATM 19 O UNK 0 0 16.082 4.405 0.00 0.00 O+0
HETATM 20 O UNK 0 0 19.303 5.032 0.00 0.00 O+0
HETATM 21 O UNK 0 0 17.351 4.065 0.00 0.00 O+0
HETATM 22 H UNK 0 0 19.82 7.866 0.00 0.00 H+0
HETATM 23 H UNK 0 0 21.257 8.578 0.00 0.00 H+0
HETATM 24 H UNK 0 0 18.735 10.148 0.00 0.00 H+0
HETATM 25 H UNK 0 0 20.163 10.855 0.00 0.00 H+0
HETATM 26 H UNK 0 0 23.064 11.989 0.00 0.00 H+0
HETATM 27 H UNK 0 0 21.636 11.282 0.00 0.00 H+0
HETATM 28 H UNK 0 0 22.037 14.337 0.00 0.00 H+0
HETATM 29 H UNK 0 0 20.608 13.629 0.00 0.00 H+0
HETATM 30 H UNK 0 0 24.915 15.331 0.00 0.00 H+0
HETATM 31 H UNK 0 0 23.487 14.623 0.00 0.00 H+0
HETATM 32 H UNK 0 0 24.015 17.783 0.00 0.00 H+0
HETATM 33 H UNK 0 0 22.587 17.076 0.00 0.00 H+0
HETATM 34 H UNK 0 0 26.883 18.7 0.00 0.00 H+0
HETATM 35 H UNK 0 0 25.456 17.993 0.00 0.00 H+0
HETATM 36 H UNK 0 0 24.534 20.435 0.00 0.00 H+0
HETATM 37 H UNK 0 0 25.962 21.143 0.00 0.00 H+0
HETATM 38 H UNK 0 0 28.83 22.093 0.00 0.00 H+0
HETATM 39 H UNK 0 0 27.4 21.385 0.00 0.00 H+0
HETATM 40 H UNK 0 0 27.891 24.522 0.00 0.00 H+0
HETATM 41 H UNK 0 0 26.462 23.814 0.00 0.00 H+0
HETATM 42 H UNK 0 0 30.762 25.474 0.00 0.00 H+0
HETATM 43 H UNK 0 0 29.332 24.766 0.00 0.00 H+0
HETATM 44 H UNK 0 0 29.818 27.897 0.00 0.00 H+0
HETATM 45 H UNK 0 0 28.389 27.189 0.00 0.00 H+0
HETATM 46 H UNK 0 0 31.264 28.168 0.00 0.00 H+0
HETATM 47 H UNK 0 0 32.692 28.876 0.00 0.00 H+0
HETATM 48 H UNK 0 0 30.298 30.572 0.00 0.00 H+0
HETATM 49 H UNK 0 0 31.728 31.281 0.00 0.00 H+0
HETATM 50 H UNK 0 0 34.611 32.285 0.00 0.00 H+0
HETATM 51 H UNK 0 0 33.183 31.578 0.00 0.00 H+0
HETATM 52 H UNK 0 0 34.736 35.051 0.00 0.00 H+0
HETATM 53 H UNK 0 0 32.247 34.011 0.00 0.00 H+0
HETATM 54 H UNK 0 0 33.681 34.721 0.00 0.00 H+0
$
```[/s]
 Doesn't do what you want.

Thank you very much indeed for the quick responses. Corona the code you have suggested is clearly almost exactly what I need, but some of the output coordinates seem to be slightly off. My desired output for the first 4 rows, with the original unchanged 4 rows placed beforehand for comparison are as follows:-

Original (example):

HETATM    1  C   UNK     0     -25.639   7.865   8.470  0.00  0.00           C+0
HETATM    2  C   UNK     0     -24.868   8.448   7.265  0.00  0.00           C+0
HETATM    3  C   UNK     0     -23.332   8.257   7.385  0.00  0.00           C+0
HETATM    4  C   UNK     0     -22.562   8.844   6.174  0.00  0.00           C+0

Desired (example):

HETATM    1  C   UNK     0     0       7.865   8.470  0.00  0.00           C+0
HETATM    2  C   UNK     0     0.771   8.448   7.265  0.00  0.00           C+0
HETATM    3  C   UNK     0     2.307   8.257   7.385  0.00  0.00           C+0
HETATM    4  C   UNK     0     3.077   8.844   6.174  0.00  0.00           C+0

Do you care about the output spacing? It reduces it to single spaces here, but you could make it tabs with awk -v OFS="\t" ...

---------- Post updated at 11:02 AM ---------- Previous update was at 11:01 AM ----------

I don't understand this output at all. What formula do you get 0.771 from? How does the smallest negative number manage to not become zero when you subtract it from itself? Why don't the other two change when you're adding to every single column?

---------- Post updated at 11:05 AM ---------- Previous update was at 11:02 AM ----------

I think I get it. You're wanting the largest negative number in the entire file.

---------- Post updated at 11:11 AM ---------- Previous update was at 11:05 AM ----------

It has to process the data twice, since it won't know the least value until the data's finished.

$ cat least.awk
BEGIN {
        # Print output tab-separated
        OFS="\t"
        # Read lines, finding the minimum from columns 6 through 8
        while(getline < FILE)
        for(N=6; N<=8; N++) if($N < MIN) MIN=$N
        # Close FILE so we can process it again from the start
        close(FILE);

        # Read and print each line, subtracting the min value from
        # columns 6-8
        while(getline < FILE)
        {
                for(N=6; N<=8; N++) $N -= MIN
                print
        }
        # Quit right here, don't go into the main awk processing loop
        exit
}
$ gawk -f least.awk -v FILE="data" # Give it filename as FILE
HETATM  1       C       UNK     0       1.073   34.577  35.182  0.00    0.00   C+0
HETATM  2       C       UNK     0       1.844   35.16   33.977  0.00    0.00   C+0
HETATM  3       C       UNK     0       3.38    34.969  34.097  0.00    0.00   C+0
HETATM  4       C       UNK     0       4.15    35.556  32.886  0.00    0.00   C+0
HETATM  5       C       UNK     0       5.684   35.36   33.013  0.00    0.00   C+0
HETATM  6       C       UNK     0       6.466   35.944  31.806  0.00    0.00   C+0
HETATM  7       C       UNK     0       7.998   35.743  31.943  0.00    0.00   C+0
HETATM  8       C       UNK     0       8.781   36.326  30.738  0.00    0.00   C+0
HETATM  9       C       UNK     0       10.312  36.126  30.875  0.00    0.00   C+0
HETATM  10      C       UNK     0       11.103  36.705  29.675  0.00    0.00   C+0
HETATM  11      C       UNK     0       12.634  36.495  29.83   0.00    0.00   C+0
HETATM  12      C       UNK     0       13.436  37.07   28.636  0.00    0.00   C+0
HETATM  13      C       UNK     0       14.971  36.877  28.76   0.00    0.00   C+0
HETATM  14      C       UNK     0       15.715  37.473  27.535  0.00    0.00   C+0
HETATM  15      C       UNK     0       17.259  37.31   27.599  0.00    0.00   C+0
HETATM  16      C       UNK     0       17.968  37.916  26.363  0.00    0.00   C+0
HETATM  17      O       UNK     0       19.292  37.74   26.486  0.00    0.00   O+0
HETATM  18      P       UNK     0       20.358  38.163  25.535  0.00    0.00   P+0
HETATM  19      O       UNK     0       21.678  37.76   26.083  0.00    0.00   O+0
HETATM  20      O       UNK     0       20.332  39.635  25.364  0.00    0.00   O+0
HETATM  21      O       UNK     0       20.156  37.507  24.221  0.00    0.00   O+0
HETATM  22      H       UNK     0       17.596  37.416  25.462  0.00    0.00   H+0
HETATM  23      H       UNK     0       17.726  38.983  26.304  0.00    0.00   H+0
HETATM  24      H       UNK     0       17.513  36.248  27.661  0.00    0.00   H+0
HETATM  25      H       UNK     0       17.642  37.805  28.497  0.00    0.00   H+0
HETATM  26      H       UNK     0       15.475  38.539  27.464  0.00    0.00   H+0
HETATM  27      H       UNK     0       15.346  36.982  26.628  0.00    0.00   H+0
HETATM  28      H       UNK     0       15.331  37.368  29.668  0.00    0.00   H+0
HETATM  29      H       UNK     0       15.202  35.81   28.831  0.00    0.00   H+0
HETATM  30      H       UNK     0       13.224  38.139  28.555  0.00    0.00   H+0
HETATM  31      H       UNK     0       13.096  36.583  27.719  0.00    0.00   H+0
HETATM  32      H       UNK     0       12.967  36.982  30.75   0.00    0.00   H+0
HETATM  33      H       UNK     0       12.838  35.425  29.914  0.00    0.00   H+0
HETATM  34      H       UNK     0       10.892  37.775  29.592  0.00    0.00   H+0
HETATM  35      H       UNK     0       10.763  36.219  28.756  0.00    0.00   H+0
HETATM  36      H       UNK     0       10.522  35.056  30.957  0.00    0.00   H+0
HETATM  37      H       UNK     0       10.651  36.613  31.794  0.00    0.00   H+0
HETATM  38      H       UNK     0       8.566   37.396  30.659  0.00    0.00   H+0
HETATM  39      H       UNK     0       8.437   35.837  29.822  0.00    0.00   H+0
HETATM  40      H       UNK     0       8.34    36.231  32.862  0.00    0.00   H+0
HETATM  41      H       UNK     0       8.211   34.673  32.025  0.00    0.00   H+0
HETATM  42      H       UNK     0       6.252   37.014  31.726  0.00    0.00   H+0
HETATM  43      H       UNK     0       6.123   35.455  30.889  0.00    0.00   H+0
HETATM  44      H       UNK     0       6.031   35.849  33.928  0.00    0.00   H+0
HETATM  45      H       UNK     0       5.902   34.291  33.091  0.00    0.00   H+0
HETATM  46      H       UNK     0       3.803   35.067  31.971  0.00    0.00   H+0
HETATM  47      H       UNK     0       3.932   36.624  32.808  0.00    0.00   H+0
HETATM  48      H       UNK     0       3.602   33.9    34.174  0.00    0.00   H+0
HETATM  49      H       UNK     0       3.73    35.458  35.011  0.00    0.00   H+0
HETATM  50      H       UNK     0       1.617   36.228  33.902  0.00    0.00   H+0
HETATM  51      H       UNK     0       1.488   34.671  33.066  0.00    0.00   H+0
HETATM  52      H       UNK     0       0       34.736  35.051  0.00    0.00   H+0
HETATM  53      H       UNK     0       1.256   33.503  35.267  0.00    0.00   H+0
HETATM  54      H       UNK     0       1.386   35.067  36.107  0.00    0.00   H+0
$

nawk -f crunch.awk myFile
crunch.awk:

BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}

FNR==NR{
  for(i=1;i<=NF;i++)
    if ($i<0)
      m=($i<m)?$i:m
  next
}
{
  for(i=1;i<=NF;i++)
    if ($i<0)
     $i+=abs(m)
  print
}

I apologise for the confusion: I treated the four rows independently.

What I want to do is take the largest negative numbers of columns 6-8, make them positive, and add that positive number to every single number in their respective columns.

---------- Post updated at 06:19 PM ---------- Previous update was at 06:13 PM ----------

Thank you very much vgersh99 - that's doing almost exactly what I want. The only tiny thing is that even when numbers are positive in the column, I need them to have the largest negative number added to them if a negative number exists in that column.

I must apologise for these repeated requests and for not explaining myself clearly in the first instance.

Do you want to determine the largest negative among ALL the columns and add its absolute value to ALL the negative values for ALL the columns?
Or you want to determine the largest negative PER COLUMN and add its absolute value PER CORRESPONDING columns?

For the later, see my original post.
For the former, use this:

BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}

FNR==NR{
  for(i=1;i<=NF;i++)
    if ($i<0)
      m=($i<m)?$i:m
  next
}
{
  for(i=1;i<=NF;i++)
    if ($i<0)
     $i+=abs(m)
  print
}
1 Like

ok, try this:

BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}

FNR==NR{
  for(i=1;i<=NF;i++)
    if ($i<0)
      m=($i<m)?$i:m
  next
}
{
  for(i=1;i<=NF;i++)
    if (i in m)
     $i+=abs(m)
  print
}
1 Like

You're doing better than usual, it sometimes takes 4 pleas for a poster to show their data, let alone admit what they want :slight_smile:

Working on it...

---------- Post updated at 11:29 AM ---------- Previous update was at 11:26 AM ----------

$ cat least.awk
BEGIN {
        OFS="\t"
        while(getline < FILE)
        for(N=6; N<=8; N++) if($N < MIN[N]) MIN[N]=$N
        close(FILE);

        while(getline < FILE)
        {
                for(N=6; N<=8; N++) $N -= MIN[N]
                print
        }
        exit
}
$ awk -f least.awk -v FILE="data2"
HETATM  1       C       UNK     0       0       7.865   8.47    0.00    0.00   C+0
HETATM  2       C       UNK     0       0.771   8.448   7.265   0.00    0.00   C+0
HETATM  3       C       UNK     0       2.307   8.257   7.385   0.00    0.00   C+0
HETATM  4       C       UNK     0       3.077   8.844   6.174   0.00    0.00   C+0
$

---------- Post updated at 11:30 AM ---------- Previous update was at 11:29 AM ----------

You can modify argv? That's a clever way of getting the same file twice! :slight_smile:

1 Like

Both of the methods you guys have suggested work perfectly and I'm really thankful to you both.

But er ... I've realised that what I actually need is to determine the largest negative among ALL the columns and add its absolute value to ALL the positive AND negative values for ALL the columns

THAT is definitely what I need. I'm very sorry: is there a simple modification to one of the methods to be able to do this? I don't have any real experience with awk.

Immense thanks for any further help.

ALL the columns? Even the ones that say HETATM, C, and UNK?

You'd better just tell us which columns.

Or better yet: An example of your input and output data!!

---------- Post updated at 04:30 PM ---------- Previous update was at 04:24 PM ----------

If I've guessed the correct "all":

$ cat least.awk
BEGIN { OFS="\t"
        while(getline < FILE)
        for(N=6; N<=10; N++) if($N < MIN) MIN=$N
        close(FILE);

        while(getline < FILE)
        {
                for(N=6; N<=10; N++) $N -= MIN;
                print
        }
        exit
}

$ awk -v FILE=data2 -f least.awk
HETATM  1       C       UNK     0       0       33.504  34.109  25.639  25.639 C+0
HETATM  2       C       UNK     0       0.771   34.087  32.904  25.639  25.639 C+0
HETATM  3       C       UNK     0       2.307   33.896  33.024  25.639  25.639 C+0
HETATM  4       C       UNK     0       3.077   34.483  31.813  25.639  25.639 C+0
$
1 Like

Yes you guessed correctly - I feel like I've learnt a lot about how to explain specific problems specifically through all these mistakes! :o

Thanks so much guys - you've really helped out a PhD student in need :smiley:

just for reference:

BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}

FNR==NR{
  for(i=1;i<=NF;i++)
    if ($i<0) {
      m=($i<m)?$i:m
      nC
  }
  next
}
{
  for(i=1;i<=NF;i++)
    if (i in nC)
     $i+=abs(m)
  print
}
1 Like