count the unique records based on certain columns

Hi everyone,

I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc.

Gene Transcript miRNA

Gar Nm_111233 miR22
Gar Nm_123440 miR22
Gar Nm_129939 miR22
Hel Nm_233900 miR13
Hel Nm_678900 miR13
Bart Nm_178181 miR22
Gar Nm_789999 miR43

Now I want to count the number of gene for each miRNA in mirna.txt

e.g.
miR22 2
miR13 1
miR15 0
miR43 1

Previously, I used the following command but it counts every occurence of miRNA.

for gene in `cat mirna.txt`; do awk -v gene=$gene '{for(i=1; i<=NF; i++) if ($i==gene) c++} END {print c}' result.txt>>output.txt; done;

Any help is appreciated. Thanks in advance.

Mic

This should count the ones present:

awk '
    NR > 1 && NF > 1 { c[$NF]++ }
    END {
        for( x in c )
            print x, c[x];
    }
'  result.txt 

Hi agama,

Thanks for your reply but the output I got using your script is:

miR43 1
miR13 2
miR22 3

I expect to get the following output instead
miR22 2
miR13 1
miR43 1

I interpreted your objective very incorrectly. Corrected code below:

 awk '
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in c )
            print x, c[x];
    }' input >output

Either one of these should give you the counts your looking for:

sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | awk '{a[$2]++} END{for(e in a){print e FS a[e]}}'
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | uniq -c
1 Like

A few changes to use the contents of your list file to print those that didn't appear with a zero.

awk '
    NR != FNR { list[$1] = 1; next; }
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in list )
            printf( "%s %d\n", x, c[x] );
    }
'  results.txt mirna.txt >output-file

Hi agama,

Thank you very much. The script works perfectly and I got the output desired. :b:

Cheers,
Mic

---------- Post updated at 03:58 PM ---------- Previous update was at 03:55 PM ----------

Hi spacebar,

Thanks for help!

Cheers,
Mic