I'm new to the forums and hope to be able to contribute something useful in the future; however I must admit that what has prompted me to join is the fact that currently I need help with something that has me at the end of my tether.
I have a PDB (Protein Data Bank) file which I have condensed to the following for ease of modification:
What I want to do is take the largest negative number from each of the three columns after the column containing nothing but '0' values (-26.712 in the first column, for example), and add the positive value of those single numbers to every single value in their respective columns. This is because I require that there be no negative numbers in the output.
I've tried all sorts of combinations of (g)awk, sed, grep in Bash, and various Python scripts (which I think is probably a more suitable language for this sort of task) but nothing has done it.
I'm still a relative newbie so am probably being ignorant about something obvious; please bear with me. Any help would be greatly appreciated.
Thank you very much indeed for the quick responses. Corona the code you have suggested is clearly almost exactly what I need, but some of the output coordinates seem to be slightly off. My desired output for the first 4 rows, with the original unchanged 4 rows placed beforehand for comparison are as follows:-
Do you care about the output spacing? It reduces it to single spaces here, but you could make it tabs with awk -v OFS="\t" ...
---------- Post updated at 11:02 AM ---------- Previous update was at 11:01 AM ----------
I don't understand this output at all. What formula do you get 0.771 from? How does the smallest negative number manage to not become zero when you subtract it from itself? Why don't the other two change when you're adding to every single column?
---------- Post updated at 11:05 AM ---------- Previous update was at 11:02 AM ----------
I think I get it. You're wanting the largest negative number in the entire file.
---------- Post updated at 11:11 AM ---------- Previous update was at 11:05 AM ----------
It has to process the data twice, since it won't know the least value until the data's finished.
$ cat least.awk
BEGIN {
# Print output tab-separated
OFS="\t"
# Read lines, finding the minimum from columns 6 through 8
while(getline < FILE)
for(N=6; N<=8; N++) if($N < MIN) MIN=$N
# Close FILE so we can process it again from the start
close(FILE);
# Read and print each line, subtracting the min value from
# columns 6-8
while(getline < FILE)
{
for(N=6; N<=8; N++) $N -= MIN
print
}
# Quit right here, don't go into the main awk processing loop
exit
}
$ gawk -f least.awk -v FILE="data" # Give it filename as FILE
HETATM 1 C UNK 0 1.073 34.577 35.182 0.00 0.00 C+0
HETATM 2 C UNK 0 1.844 35.16 33.977 0.00 0.00 C+0
HETATM 3 C UNK 0 3.38 34.969 34.097 0.00 0.00 C+0
HETATM 4 C UNK 0 4.15 35.556 32.886 0.00 0.00 C+0
HETATM 5 C UNK 0 5.684 35.36 33.013 0.00 0.00 C+0
HETATM 6 C UNK 0 6.466 35.944 31.806 0.00 0.00 C+0
HETATM 7 C UNK 0 7.998 35.743 31.943 0.00 0.00 C+0
HETATM 8 C UNK 0 8.781 36.326 30.738 0.00 0.00 C+0
HETATM 9 C UNK 0 10.312 36.126 30.875 0.00 0.00 C+0
HETATM 10 C UNK 0 11.103 36.705 29.675 0.00 0.00 C+0
HETATM 11 C UNK 0 12.634 36.495 29.83 0.00 0.00 C+0
HETATM 12 C UNK 0 13.436 37.07 28.636 0.00 0.00 C+0
HETATM 13 C UNK 0 14.971 36.877 28.76 0.00 0.00 C+0
HETATM 14 C UNK 0 15.715 37.473 27.535 0.00 0.00 C+0
HETATM 15 C UNK 0 17.259 37.31 27.599 0.00 0.00 C+0
HETATM 16 C UNK 0 17.968 37.916 26.363 0.00 0.00 C+0
HETATM 17 O UNK 0 19.292 37.74 26.486 0.00 0.00 O+0
HETATM 18 P UNK 0 20.358 38.163 25.535 0.00 0.00 P+0
HETATM 19 O UNK 0 21.678 37.76 26.083 0.00 0.00 O+0
HETATM 20 O UNK 0 20.332 39.635 25.364 0.00 0.00 O+0
HETATM 21 O UNK 0 20.156 37.507 24.221 0.00 0.00 O+0
HETATM 22 H UNK 0 17.596 37.416 25.462 0.00 0.00 H+0
HETATM 23 H UNK 0 17.726 38.983 26.304 0.00 0.00 H+0
HETATM 24 H UNK 0 17.513 36.248 27.661 0.00 0.00 H+0
HETATM 25 H UNK 0 17.642 37.805 28.497 0.00 0.00 H+0
HETATM 26 H UNK 0 15.475 38.539 27.464 0.00 0.00 H+0
HETATM 27 H UNK 0 15.346 36.982 26.628 0.00 0.00 H+0
HETATM 28 H UNK 0 15.331 37.368 29.668 0.00 0.00 H+0
HETATM 29 H UNK 0 15.202 35.81 28.831 0.00 0.00 H+0
HETATM 30 H UNK 0 13.224 38.139 28.555 0.00 0.00 H+0
HETATM 31 H UNK 0 13.096 36.583 27.719 0.00 0.00 H+0
HETATM 32 H UNK 0 12.967 36.982 30.75 0.00 0.00 H+0
HETATM 33 H UNK 0 12.838 35.425 29.914 0.00 0.00 H+0
HETATM 34 H UNK 0 10.892 37.775 29.592 0.00 0.00 H+0
HETATM 35 H UNK 0 10.763 36.219 28.756 0.00 0.00 H+0
HETATM 36 H UNK 0 10.522 35.056 30.957 0.00 0.00 H+0
HETATM 37 H UNK 0 10.651 36.613 31.794 0.00 0.00 H+0
HETATM 38 H UNK 0 8.566 37.396 30.659 0.00 0.00 H+0
HETATM 39 H UNK 0 8.437 35.837 29.822 0.00 0.00 H+0
HETATM 40 H UNK 0 8.34 36.231 32.862 0.00 0.00 H+0
HETATM 41 H UNK 0 8.211 34.673 32.025 0.00 0.00 H+0
HETATM 42 H UNK 0 6.252 37.014 31.726 0.00 0.00 H+0
HETATM 43 H UNK 0 6.123 35.455 30.889 0.00 0.00 H+0
HETATM 44 H UNK 0 6.031 35.849 33.928 0.00 0.00 H+0
HETATM 45 H UNK 0 5.902 34.291 33.091 0.00 0.00 H+0
HETATM 46 H UNK 0 3.803 35.067 31.971 0.00 0.00 H+0
HETATM 47 H UNK 0 3.932 36.624 32.808 0.00 0.00 H+0
HETATM 48 H UNK 0 3.602 33.9 34.174 0.00 0.00 H+0
HETATM 49 H UNK 0 3.73 35.458 35.011 0.00 0.00 H+0
HETATM 50 H UNK 0 1.617 36.228 33.902 0.00 0.00 H+0
HETATM 51 H UNK 0 1.488 34.671 33.066 0.00 0.00 H+0
HETATM 52 H UNK 0 0 34.736 35.051 0.00 0.00 H+0
HETATM 53 H UNK 0 1.256 33.503 35.267 0.00 0.00 H+0
HETATM 54 H UNK 0 1.386 35.067 36.107 0.00 0.00 H+0
$
I apologise for the confusion: I treated the four rows independently.
What I want to do is take the largest negative numbers of columns 6-8, make them positive, and add that positive number to every single number in their respective columns.
---------- Post updated at 06:19 PM ---------- Previous update was at 06:13 PM ----------
Thank you very much vgersh99 - that's doing almost exactly what I want. The only tiny thing is that even when numbers are positive in the column, I need them to have the largest negative number added to them if a negative number exists in that column.
I must apologise for these repeated requests and for not explaining myself clearly in the first instance.
Do you want to determine the largest negative among ALL the columns and add its absolute value to ALL the negative values for ALL the columns?
Or you want to determine the largest negative PER COLUMN and add its absolute value PER CORRESPONDING columns?
For the later, see my original post.
For the former, use this:
BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}
FNR==NR{
for(i=1;i<=NF;i++)
if ($i<0)
m=($i<m)?$i:m
next
}
{
for(i=1;i<=NF;i++)
if ($i<0)
$i+=abs(m)
print
}
BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}
FNR==NR{
for(i=1;i<=NF;i++)
if ($i<0)
m=($i<m)?$i:m
next
}
{
for(i=1;i<=NF;i++)
if (i in m)
$i+=abs(m)
print
}
Both of the methods you guys have suggested work perfectly and I'm really thankful to you both.
But er ... I've realised that what I actually need is to determine the largest negative among ALL the columns and add its absolute value to ALL the positive AND negative values for ALL the columns
THAT is definitely what I need. I'm very sorry: is there a simple modification to one of the methods to be able to do this? I don't have any real experience with awk.
BEGIN{OFS="\t"; ARGV[ARGC++] = ARGV[1] }
function abs(i) {return (i<0)?-i:i}
FNR==NR{
for(i=1;i<=NF;i++)
if ($i<0) {
m=($i<m)?$i:m
nC
}
next
}
{
for(i=1;i<=NF;i++)
if (i in nC)
$i+=abs(m)
print
}