Ibk
October 10, 2018, 10:59am
1
Hi,
Am trying to print all sequence that matches this pattern CGTTGggtTTCATT
and their positions in my file but " ggt
" can be any nucleotide. The sequence in big letters must match.
I used
awk 'BEGIN{match("CGTTGGGTTTCATT",/(GGT)+/);print RSTART,RLENGTH}' my_file > output
but didnt get expected result.
Can anyone help?
a sample input file and a desired output would be a good start (as always).
Ibk
October 10, 2018, 11:14am
3
>my_file
GTGTGTCATTTTAGCCCGTTGGGTTTCATTAAGGTGTGTCACCAGGTGGGTGGTACCTGGAGGTTATTCT
ATTGGGATAACGAGAGGAGGAGGGGCTAGAGGTCCGCGAGATTTGGGGTAGGCGGAGCCTCAGGAGGGTC
CCCTCCATAGGGTTGAACCAGGAGGGGGAGGATTGGGCTCCGCCCCGATATACCTAGTGGGTGGAGCCTA
Expected output
output
CGTTGGGTTTCATT 17 30
something along these lines:
awk 'match($0,/CGTTG.*TTCATT/) {print substr($0,RSTART,RLENGTH), RSTART, RSTART+RLENGTH}' myFile
Ibk
October 10, 2018, 11:58am
5
Thanks,
The pattern is also in other positions, I need to get all the positions the pattern matches to.
There's nothing in the suggested solution that assumes any particular "position".
Then please provide a representative sample and desired output.