I am translating your requirement to mean count all of the . ! and ? characters in a file.
This is part of what it means to find sentences. It will have problems, ex.: in text with numbers that have decimals in them. And sentences that end in an ellipsis.... < that is one! Neat. I made a self-referential sentence.
Thank you very much for the code
I have to break the files into sentence per line as well and dont want it to divide the lines if there is a word or number of the "." so i have to know how to identify it.
can you explain this bit please?
I have been looking at the topic of processing English sentence lately. Here is a demonstration of a perl script to place sentences on separate lines (minimal version):
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
perl 5.10.0
divepm (local) 1.2
-----
Perl modules:
1.04 strict
1.06 warnings
0.03 Perl6::Slurp
0.25 Lingua::EN::Sentence
-----
Input data file data5:
Now is the time
for all good men
to come to the aid
of their country.
Gobble, gobble.
Mr. Erickson said to Dr.
Olson, "Pi is approximated by 3.1415, that's S.O.P.". The AAA
came out to change my tire! Isn't that great?
-----
Results:
1) Now is the time
for all good men
to come to the aid
of their country.
Now is the time for all good men to come to the aid of their country.
2) Gobble, gobble.
Gobble, gobble.
3) Mr. Erickson said to Dr.
Olson, "Pi is approximated by 3.1415, that's S.O.P.".
Mr. Erickson said to Dr. Olson, "Pi is approximated by 3.1415, that's S.O.P.".
4) The AAA
came out to change my tire!
The AAA came out to change my tire!
5) Isn't that great?
Isn't that great?
The file uploaded needs to be copied to file minimal-sese and then made executable. The perl module Lingua/EN/Sentence.pm may be available in your repository. Otherwise it needs to be copied from the URL noted in the script comments.
Posting samples of your input and desired output will help invite on-point solutions.