compare values in different lines of file

Hi everybody,

I have a file that looks like:

A B C D -1 0
E F G H -2 0
I J K L +1
M N O P -6

I would like to compare $5 of every line. If both values are negative, I calculate a mean value and write the first line and delete the second one. If the two $5 values are different only concerning plus or minus I want to leave them as they are. They same procedure when they are both positive values.

Could anyone help me with that problem???

Thank you so much,

Christine

Not sure what exactly your output should look like, If you can give and example of the output that would be more easy to resolve.

From what i understood from your post
Chk this code

#!/usr/bin/perl

use strict;
use warnings;

my $prev_col_5_value = '';
my $prev_line ;
open my $fh , '<', 'abc.txt' || die $!;
while(<$fh>){
        chomp;
        my $line = $_;
        my @cols = split(/ /,$line);
        if ($prev_line){
                if ($prev_col_5_value < 0 && $cols[4] < 0 ) {
                        my $curr_col4_value = ($cols[4] + $prev_col_5_value )/2;
                        print "$prev_line\n";
                        $prev_line = '';
                        $prev_col_5_value = '';
                }
                else {
                        print "$prev_line\n";
                        $prev_line = $line;
                        $prev_col_5_value = $cols[4];
                        #print or dont print as per ur req

                }
        }
        else {
                $prev_line = $line;
                $prev_col_5_value = $cols[4];
        }
}
print "$prev_line \n" if ($prev_line);
close $fh;

Output

A B C D -1 0
I J K L +1
M N O P -6

Lemme know if this is not what it was meant to generate

Cheers,

Hi,

thank you for the fast help! This is pretty much, what the output should be. The only thing is the mean value. So, in the first line should be -1.5 in the 5th column (-1+(-2))/2=-1.5. Meaning, always, when both values are minus or plus, a mean value should be calculated.

So the output should look like:

A B C D -1.5 0
I J K L +1
M N O P -6

Could we do that?

Thanx again :slight_smile:
Christine

---------- Post updated at 09:46 AM ---------- Previous update was at 09:30 AM ----------

One more question: Would it be also possible with an awk script? I am much more familiar with awk than with perl :wink:

Thanx a lot,
Christine

---------- Post updated at 12:26 PM ---------- Previous update was at 09:46 AM ----------

Okay, I post my original file:

ATOM 0 BB SER 1 0 -31.958 -25.125 -11.061 1.00 0.00 -0.8
ATOM 1 BB GLY 1 1 -32.079 -26.085 -14.466 1.00 0.00 -0.4
ATOM 2 BB VAL 1 2 -36.455 -21.265 -15.792 1.00 0.00 4.2
ATOM 3 BB SER 1 3 -37.401 -20.877 -19.029 1.00 0.00 -0.8
ATOM 4 BB ALA 1 4 -42.701 -21.232 -18.584 1.00 0.00 1.8
ATOM 5 BB VAL 1 5 -47.498 -23.718 -18.979 1.00 0.00 4.2
ATOM 6 BB THR 1 6 -47.989 -24.426 -21.973 1.00 0.00 -0.7
ATOM 7 BB ALA 1 7 -46.376 -27.080 -22.868 1.00 0.00 1.8
ATOM 8 BB VAL 1 8 -44.852 -28.570 -20.796 1.00 0.00 4.2

If the values in the last column are both postive or both negative, then I want to calculate the mean value and write out only the line that originally contained the first value. If the values are different concerning plus/minus I leave them as they are. So in this case the output should be:

ATOM 0 BB SER 1 0 -31.958 -25.125 -11.061 1.00 0.00 -0.6
ATOM 2 BB VAL 1 2 -36.455 -21.265 -15.792 1.00 0.00 4.2
ATOM 3 BB SER 1 3 -37.401 -20.877 -19.029 1.00 0.00 -0.8
ATOM 4 BB ALA 1 4 -42.701 -21.232 -18.584 1.00 0.00 3.0
ATOM 6 BB THR 1 6 -47.989 -24.426 -21.973 1.00 0.00 -0.7
ATOM 7 BB ALA 1 7 -46.376 -27.080 -22.868 1.00 0.00 3.0

I think, the Perl script would work, I just cannot adapt it to my file :slight_smile:

I am really thankful for any help!

Hi Christine,

You could give this script a try. It runs with ksh:

#!/bin/ksh
prevline=""
while read line; do
  pval=${prevline##* }
  val=${line##* }
  if (( val*pval > 0 )); then
    avg=$(( (val+pval)/2 ))
    printf "%s %1.1f\n" "${prevline% *}" $avg
    line=""
  elif [[ $prevline != "" ]]; then
    echo $prevline
  fi
  prevline=$line
done < infile
if [[ $prevline != "" ]]; then
  echo $prevline
fi

Alternatively here is an awk equivalent

awk '{ if ( pval*$NF > 0 )
       { avg=(pval+$NF)/2
         sub(/ [^ ]*$/,"",prev)
         printf("%s %1.1f\n", prev, avg)
         $0=""
       }
       else if (prev != "")
         { print prev }
       sub(/ *$/,"")
       pval=prev=$0;
       sub(/.* /,"",pval)
     }
     END { if ( prev != "" )
           print prev
     }' infile
#!/usr/bin/perl

use strict;
use warnings;

my $last_col = '';
my @new_arr;
open my $fh , '<', 'abc.txt' || die $!;
while(<$fh>){
        chomp;
        my @cols = split;
        if ($#new_arr > 0){
                if ($last_col < 0 && $cols[$#cols] < 0 ) {
                        #both -ve
                        my $curr_last_col_value = ($cols[$#cols] + $last_col )/2;
                        print join(' ',@new_arr,$curr_last_col_value)."\n";
                        @new_arr = ();
                        $last_col = '';
                }
                elsif ($last_col > 0 && $cols[$#cols] > 0){
                        # both +ve
                        my $curr_last_col_value = ($cols[$#cols] + $last_col )/2;
                        print join(' ',@new_arr,$curr_last_col_value)."\n";
                        @new_arr = ();
                        $last_col = '';
                }
                else {
                        #diff signs
                        #print or dont print as per ur req
                        print join(' ',@new_arr,$last_col) ."\n";
                        @new_arr = ();
                        push @new_arr, $cols[$_] for (0..$#cols-1);
                        $last_col = $cols[$#cols];
                }
        }
        else {
                push @new_arr, $cols[$_] for (0..$#cols-1);
                $last_col = $cols[$#cols];
        }
}
print join(' ',@new_arr,$last_col) if ($#new_arr > 0);
close $fh;

Ok this is the perl script which will do the job for your file format .
Just replace abc.txt with the file name you have. Alternatively if you want to run it for a sequence of files from command line , remove the open stmts and write while(<>) instead of while (<$fh>)
it could be run as cat file1 file2 | perl scriptname
Note:- I have added extra comments and self explanatory variable names for easy understanding

And reg the perl script replacing awk, I generally go with perl when ever there are more computations and the files tend to be large. However, its a matter of personal perception. Choose the one which suits you best.

Cheers,

open FH,"<a.txt";
my @tmp=<FH>;
for(my $i=0;$i<=$#tmp/2;$i++){
  my @t1=split(" ",$tmp[2*$i]);
  my @t2=split(" ",$tmp[2*$i+1]);
  if($t1[4]*$t2[4]>0){
    print join " ",(@t1[0..3],($t1[4]+$t2[4])/2,$t1[5]);
   print "\n";
  }
  else{
     print $tmp[2*$i];
     print $tmp[2*$i+1];
  }
}

Thank you all very much!

Finally I used the Perl script and it worked perfectly!

You made my day :slight_smile:

Have a nice evening,
Christine