Uniq count second column

Hello

How can I get a number of occurrence count for this file;

ERR315389.1000156       CTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAG
ERR315389.1000281       GCGTCTGGCAACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAA
ERR315389.1000504       GGTCATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGGGCTGAGCTCTCAGAAGGCAAATGTGCCGAGCTTGAAGAAGAATTGAAAACTGTGAC
ERR315389.1000637       GCTGGTGTCACTGCAAAAGAAACTCAAGGGCACCGAAGATGAACTGGACAAATACTCTGAGGCTCTCAAAGATGCCCAGGAGAAGCTGGAGCTGGC
ERR315389.1000647       CGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGAGAACGCCTT
ERR315389.1000762       AAAGCATTGATGACTTAGAAGACGAGCTGTACGCTCAGAAACTGAAGTACAAAGCCATCAGCGAGGAGCTGGACCACGCTCTCAACGATATGACTT
ERR315389.1000854       AGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCATTGAGAGCGACC
ERR315389.1001141       AAAAAGGCCACCGATGCTGAAGCCGACGTAGCTTCTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCA
ERR315389.1001145       GCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGA
ERR315389.1001393       CAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAA

I tried cat file1 | uniq -cf1 > file2 count the occurrence for the first column but it end up with the count for 1st column.

11 ERR315389.1502254       CTCCGCCCGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGA
     12 ERR315389.6544981       NTCCGCCCGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGA
     24 ERR315389.4012310       CCGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCG
     24 ERR315389.5696434       CGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGA
     36 ERR315389.456083        CCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAA
     12 ERR315389.894063        CGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAG
     12 ERR315389.1554704       CTCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAG
     24 ERR315389.5277557       CGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAG
     60 ERR315389.2681352       GCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGG
    144 ERR315389.452044        CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA

How can I get the count based second column and ignore which name from the first column they take. Te first column will be an arbitrary name for the second column.

For instance this raw file

ERR315389.1451218       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.1640056       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.3946553       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.4137809       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.452044        CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.4597314       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.4896643       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.5450210       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.6159786       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA
ERR315389.7443074       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA

and the desired file which count the number of occurrence from 2nd column is here

10 ERR315389.1451218       CGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGA

Thank you

uniq needs sorted input. man uniq:

Unfortunately, your sample has only unique second fields so it can't be tested against. Adding a few non-unique lines, this may lead to the desired result:

sort -k2 file | uniq -cf1
      1 ERR315389.1000637       GCTGGTGTCACTGCAAAAGAAACTCAAGGGCACCGAAGATGAACTGG. . .
      7 ERR315389.1000504       GGTCATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGGGCTGAGC. . .
1 Like

Thanks! As you said because uniq works with sort and count the redundant adjacent line. So I combine both uniq and sort to get the desired output. Here is my code;

 cat file |sort -k1 -u | sort -k2 | uniq -cf1| sort -rn

The output as here;

633 ERR315389.1008500       GAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCT
    519 ERR315389.1012317       CGAAGATGAACTGGACAAATACTCTGAGGCTCTCAAAGATGCCCAGGAGAAGCTGGAGCTGGCAGAGAAAAAGGCCACCGATGCTGAAGCCGACGTAGCTT
    500 ERR315389.1004436       CTTGGATCGAGCTGAGCAGGCGGAGGCCGACAAGAAGGCGGCGGAAGACAGGAGCAAGCAGCTGGAAGATGAGCTGGTGTCACTGCAAAAGAAACTCAAGG
    481 ERR315389.1029324       GTTGGATCGTGCCCAGGAGCGTCTGGCAACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGA
    464 ERR315389.10163 CTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCTTTCCGACAAGCTGAAGGAGGCTGAGA
    369 ERR315389.1010914       CCGAGCTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAG
    365 ERR315389.1010286       CTGAGCTCTCAGAAGGCAAATGTGCCGAGCTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAG
    342 ERR315389.1005391       CTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATGACTTAGAAGACGAGCTGTACGCTCAGAAACTGAAGTACAAAGCCATC
    296 ERR315389.1005033       AAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCATT
    289 ERR315389.1001141       AAAAAGGCCACCGATGCTGAAGCCGACGTAGCTTCTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCAACAGC

Thanks again!

Don't use cat , it's a waste of resources.
Are you sure you want sort -u ? It may spoil your count results...