A C program required for portability

I have to solve a problem for my wife who is engaged in Research in Breast Cancer.

  1. She has frequently to search a long single line of alphabetic characters (lower case) for an exact match of a string.

e.g. mwaaagglwrsraglralfrsrdaalfpgcerglhcsavscknwlkkfasktkkkvwyespslgshstykpskleflmrstskktrkedharlralngllykaltdllctpevsqelydlnvelskvsltpdfsacraywkttlsaeqnahmeavlqrsaahmslisywqsqtldpgmkettlykmisgtlmphnpaapqsrpqapvcvgsimrrstsrlwstkggkikgsgawcgrgrwls

  1. The ONLY two strings to be searched for are -

r-r--s
r-r--t

The - can be any of the following characters

acdefghiklmnpqrstvxy

  1. Once an exact match/s has been made it is essential to know the number of characters from the start of the line inclusive of the 6 character string to each match.

Can anyone suggest a program in ANSI C that will compile in the first instance in Solaris (SunOS 5.9).

But is portable (source and then re compile) to HP-UX and AIX and to XP.

It is urgent.

Thanks

Nev

p.s. The immediate need has been solved with a ksh script but c is necessary to match some other utilities.

Also we have to solve a problem in that the raw data although shown as one line above; comes as many lines sometimes as many as 50 so we have to join these lines to make one single line. It does not alter the data at all as it is shown as many lines only on for ease of display purposes.

Ok. To answer the second part first. To strip out carriage returns from your input file do:

awk '{printf("%s"), $0}' filename > newfile

The awk command should be standard across most Unix O/S.

In HP-UX use the functions: regcomp, regexec. These functions allow you to process regular expressions. In other words you may search a string for a pattern (like the ones you have described).

Sorry I don't have time to explain these functions in depth or write you a little test program, but regexec will point you to where it finds the first match in your line (in a pointer). If you called the function again it will return a pointer to the next pattern in your string. To work out the number of characters between each expression found you would subtract the first pointer value from the second pointer value and so on ...

I do know that you may have portability issues with the regexec and regcomp functions. When my trainee wrote a program and ported it to windows he found that a different library with different function names was required. To make matters worse the windows functions had different rules for regular expressions.
I would not be surprised if you encountered similiar issues between different flavours of Unix.

If portability is an issue you may have to write your own parsing algorithms ...

Sorry I can't be more help but I have run out of time ...