I have to solve a programming problem for my wife who is engaged in Research in Breast Cancer.
-
She has frequently to search a long single line of alphabetic characters (lower case) for an exact match of a string.
e.g. mwaaagglwrsraglralfrsrdaalfpgcerglhcsavscknwlkkfasktkkkvwyespslgshstykpskleflmrstskktrkedharlralngllykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapqsrpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls
-
The ONLY two strings to be searched for are -
r-r--s
r-r--t
The - can be any of the following characters
acdefghiklmnpqrstvyz
-
Once an exact match has been made it is essential to know the number of characters from the start of the line inclusive of the 6 character string.
Can anyone suggest a program or script.
It is urgent.
Thanks
Nev
well something like the following pattern match can be used.
/r[acdefghiklmnpqrstvyz]r[acdefghiklmnpqrstvyz][acdefghiklmnpqrstvyz][s|t]/
useing perls index() or substr() would prolly be the best way to go i think. I know i will work on this tommarow just to knwo for myself how to do it. but i will be excited to see what others come up with befor i can post again.
this gives me something to think about tonight. heh
mmm some of the logic in this would be like so if index is used.
load the string into the index function.
index will find a "specified" number of occurances. always going with the left most unless otherwise specified. (so if there are 2 found strings i am at a loss. unless you take the return value of the index and load that into another index search and use the return value as a starting position, and or incromenting the occurance rateing. tossing this in a loop till the end of string.
the return value of the index search is the # of characters till a match is found. so that should fulfill your request.
what do you guys think?
If I understand what you're asking, try this. This script does what I think you want done. Ai least, I think it does...
#! /usr/bin/ksh
## r-r--s
## r-r--t
longset="[acdefghiklmnpqrstvyz]"
pattern="r${longset}r${longset}${longset}[ts]"
typeset -u upshift
linen=0
IFS=""
while read input ; do
orig=$input
matches=0
pos=1
((linen=linen+1))
image=""
while ((${#input})) ; do
preamble="Line: ${linen} At position"
if [[ $input = *(?)${pattern}*(?) ]] ; then
((matches=matches+1))
leftover=${input#*${pattern}}
temp=${input%${leftover}}
lead=${temp%${pattern}}
this=${temp#${lead}}
upshift=${this}
input=$leftover
if ((${#lead})) ; then
echo $preamble $pos ${#lead} unmatched characters
image="${image}${lead}"
((pos=pos+${#lead}))
fi
echo $preamble $pos MATCH: $this
image="${image}${upshift}"
((pos=pos+${#this}))
else
if ((matches)) ; then
echo $preamble $pos ${#input} trailing characters
fi
image="${image}${input}"
input=""
fi
done
if ((matches)) ; then
echo "$image"
echo
echo
fi
done
exit 0
Put the lines to searched into a data file and run them against this script. Something like:
./thisscript < data.file
#!/usr/bin/perl -w
while (<>) {
chomp;
if ($_ =~ /(r[acdefghiklmnpqrstvyz]r.{2}[st])/ ) { # see if the current line matches what we are looking for.
$found = index($_,$1); # if we find a match find out how many positions over in the line the match is
write (STDOUT); # print out the report.
}
}
format STDOUT =
@<<<<< @<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$.,($found+6),(s/(r[acdefghiklmnpqrstvyz]r.{2}[st])/\U$1/g)
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$_
+-----------------------------+
.
format STDOUT_TOP =
+=================================+
|COOL STUFF BY OPTIMUSP at UNIXCOM|
+=================================+
Line Position Text
==== ======== ====
.
MY TEST DATA
mwaaagglwrsraragtlralfrsrdaalfpgcerglhcstrarjxsatacsavswlkkfaslgshstykpskleflmrstskktrkedharlralngll
ykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapq
srpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls
so far this is the base code. the only thing left is something to iterate thru the line to see if there is more then 1 match. but then again. if you have at least 1 match you can study that line a bit more. for a final inspection.
Okay
Thanks the ksh script works fine.
I added some refinements including a log file.
I haven'y yet tried the PERL but I am sure it will work.
Thanks guys.