Hello,
Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use:
just to name a few.
Standard parsers such as the Stanford do not parse this correctlyand treat the full-stop as a delimiter whatever be its occurrence.
A Perl script would do the job, but since I am working on dynamic data where on the fly detection is needed, I am looking for a regex which can do the job and correctly ignore the above cases and identify only valid ones.
Use of close proximity i.e. ignore if between a full-stop and the next full-stop there are only a couple of words is a possibility but does not work in all cases.
Does anyone know of a solution to this thorny issue ? Many thanks in advance for your help
Hello,
Maybe I was not very clear. What I want is a regex that identifies the full-stop as an end of sentence and excludes all other full-stops as listed in my mail which are not sentence delimiters but delimit entities such as Temperature, Currency, Acronyms, Dates etc.
Many thanks once again
Hi Many thanks.
I tried the regex you had provided.
Here is the input:
What I need is that the regex should identify only sentences delimited with a full-stop.
The expected output would be:
and not for example
The Regex which you furnished and which I applied as a Unix regex gave me the following:
I tried quite a few tweaks but they made it worse.
Any workarounds please. I have a huge database with this type of strings and need to identify valid strings.
Many thanks
Sorry my net was down and could not ack ur answer.
Many thanks for the script. The only hassle is that it needs to be a regex since I need to process data on the fly dynamically and not off-line using SED.
Any suggestions?
I did tweak your regex to suit my needs but drew a blank.
a regex will match something, then what ?
If don't understand what do you mean by a regex to process on the fly dynamically. .
Can you give me an exemple please ?
The goal even if it's "dynamic on the fly" is to replace the right full-stop by full-stop <new-line>
I don't get it how can you do that only with a regex ? are you using perl ?
perl :
perl -pe 's/\. ([A-Z])/.\n$1/g'
$ perl -pe 's/\. ([A-Z])/.\n$1/g' input-file
The temperature was 32.8 degrees Celsius.
His B.Sc. degree was deemed insufficient.
He owed the bank USD 4000.50 which he had not paid back.
On 27.07.2004 a major earthquake occurred.
It was 17.05 by the clock.
Hi,
Many thanks for the regex. I will try it out and get back to you. By "on the fly", I meant that the regex is inserted within a java string which in turn interrogates a web-site and returns full sentences for searching and indexing.
This is why a Perl script would not help, since it would mean calling the script. I will try and see if the script can be called from Java, but the open source software we are using demands a regex and hence the request.
Many thanks