Please suggest a script or solution?

I have to solve a programming problem for my wife who is engaged in Research in Breast Cancer.

  1. She has frequently to search a long single line of alphabetic characters (lower case) for an exact match of a string.

    e.g. mwaaagglwrsraglralfrsrdaalfpgcerglhcsavscknwlkkfasktkkkvwyespslgshstykpskleflmrstskktrkedharlralngllykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapqsrpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls

  2. The ONLY two strings to be searched for are -

    r-r--s
    r-r--t

    The - can be any of the following characters

      acdefghiklmnpqrstvyz
    
  3. Once an exact match has been made it is essential to know the number of characters from the start of the line inclusive of the 6 character string.

Can anyone suggest a program or script.

It is urgent.

Thanks

Nev

well something like the following pattern match can be used.

/r[acdefghiklmnpqrstvyz]r[acdefghiklmnpqrstvyz][acdefghiklmnpqrstvyz][s|t]/

useing perls index() or substr() would prolly be the best way to go i think. I know i will work on this tommarow just to knwo for myself how to do it. but i will be excited to see what others come up with befor i can post again.

this gives me something to think about tonight. heh

mmm some of the logic in this would be like so if index is used.

load the string into the index function.
index will find a "specified" number of occurances. always going with the left most unless otherwise specified. (so if there are 2 found strings i am at a loss. unless you take the return value of the index and load that into another index search and use the return value as a starting position, and or incromenting the occurance rateing. tossing this in a loop till the end of string.

the return value of the index search is the # of characters till a match is found. so that should fulfill your request.

what do you guys think?

If I understand what you're asking, try this. This script does what I think you want done. Ai least, I think it does...

#! /usr/bin/ksh

##  r-r--s
##  r-r--t

longset="[acdefghiklmnpqrstvyz]"
pattern="r${longset}r${longset}${longset}[ts]"
typeset -u upshift

linen=0
IFS=""
while read input ; do
        orig=$input
        matches=0
        pos=1
        ((linen=linen+1))
        image=""
        while ((${#input})) ; do
                preamble="Line: ${linen} At position"
                if [[ $input = *(?)${pattern}*(?) ]] ; then
                        ((matches=matches+1))
                        leftover=${input#*${pattern}}
                        temp=${input%${leftover}}
                        lead=${temp%${pattern}}
                        this=${temp#${lead}}
                        upshift=${this}
                        input=$leftover
                        if ((${#lead})) ; then
                                echo $preamble $pos ${#lead} unmatched characters
                                image="${image}${lead}"
                                ((pos=pos+${#lead}))
                        fi
                        echo $preamble $pos MATCH: $this
                        image="${image}${upshift}"
                        ((pos=pos+${#this}))
                else
                        if ((matches)) ; then
                                echo $preamble $pos ${#input} trailing characters
                        fi
                        image="${image}${input}"
                        input=""
                fi
        done
        if ((matches)) ; then
                echo "$image"
                echo
                echo
        fi
done
exit 0

Put the lines to searched into a data file and run them against this script. Something like:
./thisscript < data.file

#!/usr/bin/perl -w

while (<>) {
        chomp;

        if ($_ =~ /(r[acdefghiklmnpqrstvyz]r.{2}[st])/ ) {      # see if the current line matches what we are looking for.
                $found = index($_,$1);  # if we find a match find out how many positions over in the line the match is
                write (STDOUT);         # print out the report.
        }
}



format STDOUT =
@<<<<< @<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$.,($found+6),(s/(r[acdefghiklmnpqrstvyz]r.{2}[st])/\U$1/g)
              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
+-----------------------------+
.

format STDOUT_TOP =
+=================================+
|COOL STUFF BY OPTIMUSP at UNIXCOM|
+=================================+
Line Position Text
==== ======== ====
.

MY TEST DATA
mwaaagglwrsraragtlralfrsrdaalfpgcerglhcstrarjxsatacsavswlkkfaslgshstykpskleflmrstskktrkedharlralngll
ykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapq
srpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls

so far this is the base code. the only thing left is something to iterate thru the line to see if there is more then 1 match. but then again. if you have at least 1 match you can study that line a bit more. for a final inspection.

Okay

Thanks the ksh script works fine.
I added some refinements including a log file.

I haven'y yet tried the PERL but I am sure it will work.

Thanks guys.

:frowning: