Add similar pairs in a txt file

Tzole · March 7, 2013, 8:24am

Hi guys!!!

In my txt file there are a lot of pairs. Some of them are similar, so I am trying to add these pairs. For a example:

File:

  ASP - GLN = 14
  SER - GLU = 14
  ARG - ASN = 13
  ARG - TYR = 13
  ASP - ARG = 13
  GLU - ARG = 13
  GLU - GLN = 13
  ALA - ARG = 12
  ARG - GLN = 12
  ASN - ARG = 12
  ASN - ASP = 12
....

I want the output file to add these similar pairs:

Output file:

  ASP - GLN = 14
  SER - GLU = 14
  ARG - ASN = 25
  ARG - TYR = 13
  ASP - ARG = 13
  GLU - ARG = 13
  GLU - GLN = 13
  ALA - ARG = 12
  ARG - GLN = 12
  ASN - ASP = 12
.....

Any solutions Thanks!!!

rbatte1 · March 7, 2013, 8:44am

Is your file fixed width? If so, a (rather clunky) script could do this:-

#!/bin/ksh

for refs in `cut -d"=" filename | sort -u`
do
   tot=0
   grep "$refs" filename | while read a b c d value
   do
      ((tot=$tot+$value))
   done
   print "$refs $tot"
done > newfile

Probaby much neater and more efficient with an awk, but I'm not proficient enough to write them.

I hope that this helps.

Robin
Liverpool/Blackburn
UK

Tzole · March 7, 2013, 9:13am

Hi rbatte1. Thanks for your reply. Unfortunately I am not familiar with ksh.
I try to use this script but I couldn't run it :o

ctsgnb · March 7, 2013, 10:25am

# cat foot
  ASP - GLN = 14
  SER - GLU = 14
  ARG - ASN = 13
  ARG - TYR = 13
  ASP - ARG = 13
  GLU - ARG = 13
  GLU - GLN = 13
  ALA - ARG = 12
  ARG - GLN = 12
  ASN - ARG = 12
  ASN - ASP = 12
# awk '{i=($3<$1?$3" - "$1:$1" - "$3);A+=$5}END{for(i in A) print i" = " A}' foot
ARG - TYR = 13
ARG - ASP = 13
ARG - GLU = 13
GLU - SER = 14
ALA - ARG = 12
ARG - GLN = 12
ASP - GLN = 14
ASN - ASP = 12
GLN - GLU = 13
ARG - ASN = 25
#

Tzole · March 7, 2013, 10:41am

ctsgnb exactly what I need Thanks!!!!!

MadeInGermany · March 7, 2013, 12:22pm

Nice method to find the key!
Good, the order doesn't matter!
An attempt to keep the original order makes it more complex and increases memory consumption.

awk '{i=($3<$1?$3" - "$1:$1" - "$3)} !(i in A) {B[NR]=i}  {A+=$5}END{for(i=1;i in B;i++) print B" = " A[b]}'

Scrutinizer · March 7, 2013, 1:59pm

Another one that preserves order:

awk 'NR==FNR{if(T[$3,$1]x) T[$3,$1]+=$5; else T[$1,$3]+=$5; next} $5=T[$1,$3]x' file file

DGPickett · March 7, 2013, 4:17pm

I like sort merge, more robust. I use ksh88+/bash, but I expect awk could do it simpler:

labc= le=0
 
(
 sort foot
 echo 
 ) | while read a b c d e
do
 abc="$a $b $c"
 
 case "$labc" in
  ("$abc") # match
   (( le += e ))
   ;;
  (?*) # mismatch not blank
   echo "$labc = $le"
   ;& # intentional fall through
  (*) # blank = first
   labc="$abc" le="$e"
   ;;
  esac
 done