Hello everyone,
I want to get the information from which read from Column 2 (File1) (eg: Read ID: ERR315389.743357) and retrieve the information from column 2,3 and 4 from (File2). Basically file1 (~42k lines) and file 2 (~700k lines). The desired output will be:
Count Read ID Sequence Exon Transcript ID
100 ERR315389.6445937 CTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCAACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGA 4 ENST00000267996
To add the information, I collapse the redundant read ID from file 2 uniq (UNIQ) command and print the count of redundant read ID in the file 1.
96 ERR315389.743357 GAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGG
#96 mean the read ID has 96 times in file 2.
File 1
Count Read ID Sequence
96 ERR315389.743357 GAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGG
96 ERR315389.5907790 TGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGC
96 ERR315389.4298798 ATCAAGGTCCTTTCCGACAAGCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATGACTTAGAAGA
96 ERR315389.422020 ATCAAGGTCCTTTCCGACAAGCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATGACTTAGAAGA
96 ERR315389.2233748 ATCAAGGTCCTTTCCGACAAGCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATGACTTAGAAGA
96 ERR315389.2069419 ATCAAGGTCCTTTCCGACAAGCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATGACTTAGAAGA
92 ERR315389.6677500 AAGAGGCCAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGG
92 ERR315389.4058303 GAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCATTGAGAGCGACCTGGAACG
88 ERR315389.4648318 CATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGGGCTGAGCTCTCAG
File 2
Read ID Transcript ID Exon Sequence
ERR315389.3990366 ENST00000267996 4 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000288398 4 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000317516 3 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000334895 3 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000357980 5 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000358278 4 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000403994 4 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000404484 3 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000558264 2 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
ERR315389.3990366 ENST00000558314 4 AAAAAAAATGGAAATTCAGGAGATCCAACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGCTGGTCATCA
Thank you for your respond.