Please suggest a script or solution?

nmsinghe · September 18, 2002, 4:08pm

I have to solve a programming problem for my wife who is engaged in Research in Breast Cancer.

She has frequently to search a long single line of alphabetic characters (lower case) for an exact match of a string.

e.g. mwaaagglwrsraglralfrsrdaalfpgcerglhcsavscknwlkkfasktkkkvwyespslgshstykpskleflmrstskktrkedharlralngllykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapqsrpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls
The ONLY two strings to be searched for are -

r-r--s
r-r--t

The - can be any of the following characters
```
  acdefghiklmnpqrstvyz
```
Once an exact match has been made it is essential to know the number of characters from the start of the line inclusive of the 6 character string.

Can anyone suggest a program or script.

It is urgent.

Thanks

Nev

Optimus_P · September 18, 2002, 6:23pm

Originally posted by nmsinghe
I have to solve a programming problem for my wife who is engaged in Research in Breast Cancer.

She has frequently to search a long single line of alphabetic characters (lower case) for an exact match of a string.

e.g. mwaaagglwrsraglralfrsrdaalfpgcerglhcsavscknwlkkfasktkkkvwyespslgshstykpskleflmrstskktrkedharlralngllykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapqsrpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls

The ONLY two strings to be searched for are -

r-r--s
r-r--t

The - can be any of the following characters

acdefghiklmnpqrstvyz

Once an exact match has been made it is essential to know the number of characters from the start of the line inclusive of the 6 character string.

Can anyone suggest a program or script.

It is urgent.

Thanks

Nev

well something like the following pattern match can be used.

/r[acdefghiklmnpqrstvyz]r[acdefghiklmnpqrstvyz][acdefghiklmnpqrstvyz][s|t]/

useing perls index() or substr() would prolly be the best way to go i think. I know i will work on this tommarow just to knwo for myself how to do it. but i will be excited to see what others come up with befor i can post again.

this gives me something to think about tonight. heh

mmm some of the logic in this would be like so if index is used.

load the string into the index function.
index will find a "specified" number of occurances. always going with the left most unless otherwise specified. (so if there are 2 found strings i am at a loss. unless you take the return value of the index and load that into another index search and use the return value as a starting position, and or incromenting the occurance rateing. tossing this in a loop till the end of string.

the return value of the index search is the # of characters till a match is found. so that should fulfill your request.

what do you guys think?

Perderabo · September 18, 2002, 7:50pm

If I understand what you're asking, try this. This script does what I think you want done. Ai least, I think it does...

#! /usr/bin/ksh

##  r-r--s
##  r-r--t

longset="[acdefghiklmnpqrstvyz]"
pattern="r${longset}r${longset}${longset}[ts]"
typeset -u upshift

linen=0
IFS=""
while read input ; do
        orig=$input
        matches=0
        pos=1
        ((linen=linen+1))
        image=""
        while ((${#input})) ; do
                preamble="Line: ${linen} At position"
                if [[ $input = *(?)${pattern}*(?) ]] ; then
                        ((matches=matches+1))
                        leftover=${input#*${pattern}}
                        temp=${input%${leftover}}
                        lead=${temp%${pattern}}
                        this=${temp#${lead}}
                        upshift=${this}
                        input=$leftover
                        if ((${#lead})) ; then
                                echo $preamble $pos ${#lead} unmatched characters
                                image="${image}${lead}"
                                ((pos=pos+${#lead}))
                        fi
                        echo $preamble $pos MATCH: $this
                        image="${image}${upshift}"
                        ((pos=pos+${#this}))
                else
                        if ((matches)) ; then
                                echo $preamble $pos ${#input} trailing characters
                        fi
                        image="${image}${input}"
                        input=""
                fi
        done
        if ((matches)) ; then
                echo "$image"
                echo
                echo
        fi
done
exit 0

Put the lines to searched into a data file and run them against this script. Something like:
./thisscript < data.file

Optimus_P · September 19, 2002, 12:44pm

#!/usr/bin/perl -w

while (<>) {
        chomp;

        if ($_ =~ /(r[acdefghiklmnpqrstvyz]r.{2}[st])/ ) {      # see if the current line matches what we are looking for.
                $found = index($_,$1);  # if we find a match find out how many positions over in the line the match is
                write (STDOUT);         # print out the report.
        }
}



format STDOUT =
@<<<<< @<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$.,($found+6),(s/(r[acdefghiklmnpqrstvyz]r.{2}[st])/\U$1/g)
              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
+-----------------------------+
.

format STDOUT_TOP =
+=================================+
|COOL STUFF BY OPTIMUSP at UNIXCOM|
+=================================+
Line Position Text
==== ======== ====
.

MY TEST DATA
mwaaagglwrsraragtlralfrsrdaalfpgcerglhcstrarjxsatacsavswlkkfaslgshstykpskleflmrstskktrkedharlralngll
ykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapq
srpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls

so far this is the base code. the only thing left is something to iterate thru the line to see if there is more then 1 match. but then again. if you have at least 1 match you can study that line a bit more. for a final inspection.

nmsinghe · September 20, 2002, 10:15am

Okay

Thanks the ksh script works fine.
I added some refinements including a log file.

I haven'y yet tried the PERL but I am sure it will work.

Thanks guys.