Hi,
I'm trying to match the front and back of a sequence. It works when there is an exact match (obviously), but I need the regex to be more flexible. When we get strings of nucleotides sometimes their prefixes and suffixes aren't exact matches. Sometimes there will be an extra letter and sometimes a letter will be missing or sometimes both.
For example if I was trying to match the string "Imhungry" in the front of a string and replace it with nothing I would use the following code.
$sequence =~ s/^.*?Imhungry//s;
This works great, but I need help writing some flexibility in the regex where I could also capture instances where
[1] single letter is missing eg."Imungry" or "mungry".
[2] a single letter is added (any letter) eg. "Immhungry" or Imhungryy"
[3] both eg. "Imhungyy" or "Immungryy" *notice this last example has two single letter duplications and one deletion
Thanks!
If this is too absurd let me know.
With a wildcard character I think I can do this.
$sequence =~ s/^.*?I{0,2}m{0,2}h{0,2}u{0,2}n{0,2}g{0,2}r{0,2}y{0,2}//s;