Iterating over a list using awk, when variable

cavanac2 · December 17, 2011, 7:18am

Hi,

I've recently started using Awk commands as i need it to pull information out of a file and give me the mean value of a series of numbers. Here is the code i run on my Infile and it works just fine.

awk  '{if ($1 == "Mam189") print $0}' Infile | awk '{if ($1 != $2) print $0}' | awk '{sum+=$3} END { print "Mam189 = ",sum/NR}'

However i need to run this code thousands of times replacing the word "Mam189" with a different Mam# each time.

Essentially i would like to feed in a list, like this:
Mam189
Mam220
Mam18
Mam5
Mam100

and then do something like a for loop

for i in `cat list`; 
do;
awk  '{if ($1 == "$i") print $0}' Infile | awk '{if ($1 != $2) print $0}' | awk '{sum+=$3} END { print "$i= ",sum/NR}'
done;

This however does not work. I was wondering if anyone knew of a work around or a different approach i could take to do this?

radoulov · December 17, 2011, 5:19pm

If you post a sample of the original unmodified input and example of the expected/desired output, it would be definitely easier.

cavanac2 · December 17, 2011, 6:20pm

Hi,

Apologies, ive been looking at this for so long, i forget that its actually not that clear.

The original file that i am working with looks somthing like this

In the first colum is DNA sequence data for Mam189, Mam82 and Mam426 that i am comparing to other sequence data, listed in the second colum. The thrid colum is the sequence similarity between the data listed in colum 1 and colum 2.

The code above essentially takes everything that Mam189 is compared to and gives an average of the third colum. The output would read

I would like to be able to feed in a list of names, so that the code would be able to iterate over a list and give the average for Mam189, Mam82, Mam426.

I've only had experience using a for loop before, but it wont work in this case. Not sure what the best approach is to deal with this.

Any advice would be much appreciated.

dude2cool · December 17, 2011, 8:56pm

Try this:

$

gawk 'NR==FNR{ a[$1] += $3; b[$1]++} NR!=FNR{ for(key in a) {if($1==key)print key,a[key]/b[key]}}' /tmp/1 /tmp/2

Mam189 72.4167
Mam426 78.6133

I used the following sample files:

radoulov · December 18, 2011, 5:18am

In addition, this will calculate the average per DNA sequence for all the entries:

awk 'END {
  for (d in dna)
    print d, dna[d]/count[d]
  }
{
  dna[$1]   += $3
  count[$1] ++
  }' infile

cavanac2 · December 18, 2011, 1:32pm

Awesome thanks, that works great!

Best,