Hi, my problem is that I have two files. File no. 1 is a gff text file (say gi1) that has gene information like :
********************
gene 39389788..39395643
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
mRNA join(39389788..39389839,39390696..39390861,
39391681..39391799,39393855..39394100,39394750..39394878,
39394997..39395162,39395375..39395643)
/gene="RPSA"
/product="ribosomal protein SA, transcript variant 1"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NM_002295.4"
/db_xref="GI:70609879"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
mRNA join(39390696..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395643)
/gene="RPSA"
/product="ribosomal protein SA, transcript variant 2"
/exception="unclassified transcription discrepancy"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NM_001012321.1"
/db_xref="GI:59859884"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
CDS join(39390729..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395469)
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/codon_start=1
/product="40S ribosomal protein SA"
/protein_id="NP_001012321.1"
/db_xref="GI:59859885"
/db_xref="CCDS:CCDS2686.1"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
CDS join(39390729..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395469)
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/codon_start=1
/product="40S ribosomal protein SA"
/protein_id="NP_002286.2"
/db_xref="GI:9845502"
/db_xref="CCDS:CCDS2686.1"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
gene 39391466..39391614
/gene="SNORA6"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/db_xref="GeneID:574040"
/db_xref="HGNC:32591"
ncRNA 39391466..39391614
/gene="SNORA6"
/ncRNA_class="snoRNA"
/product="small nucleolar RNA, H/ACA box 6"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NR_002325.1"
/db_xref="GI:68510025"
/db_xref="GeneID:574040"
/db_xref="HGNC:32591"
gene 39394155..39394308
/gene="SNORA62"
/note="Derived by automated computational analysis using...
*****************************************
now, file no. 2 is a mapped txt file like:
*********************************
Gene_input_file: f3
sno_input_file: chr3
319 found_in_gene 52698648..52707224 at 52704105 and_count: 5457
68 found_in_gene 52698648..52707224 at 52705463 and_count: 6815
82 found_in_gene 52698648..52707224 at 52701967 and_count: 3319
124 found_in_gene 39793218..40244467 at 40222682 and_count: 429464
202 found_in_gene 9443305..10558922 at 10110734 and_count: 667429
228 found_in_gene 46262602..46896241 at 46629723 and_count: 367121
..and so on.
**************************************
so, I need to extract the region from file 2 say, 52698648..52707224 for id-319, which begins from position 52704105 in gff file. And then search it in a file 1, for the sub-location of this gene, say, whether its in cDNA, mRNA etc. If its not fount the output should be:
'319 not found Intron'
else, if its found, output should be
'
319 found_in mRNA.'
please help me with the shell scripting or perl (or both)..I am new to this linux world. :wall: