Code for count the frequency of interacting pairs

Tzole · February 20, 2013, 8:35am

Hi all,

I am trying to analyze my data, and I will need your experience.

I have some files with the below format:

  res1 = TYR res2 = ASN 
  res1 = ASP res2 = SER
  res1 = TYR res2 = ASN
  res1 = THR res2 = LYS 
  res1 = THR res2 = TYR

etc (many lines)

I am trying to find the frequency of the above interacting pairs.
The list of this residues are (lets say aminoacids.in):

  ALA
  ARG
  ASN
  ASP
  CYS
  GLN
  GLU
  GLY
  HIS
  ILE
  LEU
  LYS
  MET
  PHE
  PRO
  SER
  THR
  TRP
  TYR
  VAL

So, the output file will be sth like this (example):

  TYR � ASN = 50 times
  THR � TYR = 39 times

Etc�

Any ideas??
Thank you in advance J

pamu · February 20, 2013, 8:53am

I am completely blank on this..

Please provide some more details.

user8 · February 20, 2013, 9:01am

You could start with:

awk '{a[$3 " - " $6]++}END{ for (i in a) print i, "=", a}'

Tzole · February 20, 2013, 9:06am

The third and the sixth columns have a three letter code for the aminoacids. I have more than 1000 lines of these pairs, and I am trying to calculate how many times these pairs are seen in the file.
In my example, the first and the third line are the same so the result is:
res1 = TYR res2 = ASN : 2 times I hope I helped you

---------- Post updated at 03:06 PM ---------- Previous update was at 03:02 PM ----------

I must learn to use awk!!! Thank you user8 and pamu

user8 thank you so much, it works