How to identify sentences from a text?

Hi,

I have to identify sentences from this text.

If i split these statements by this way:

@sentence= split(/\.\W*/,$text);

I will get these following things also in the output along with proper sentences.

Biol Reprod.

2002 Mar;66(3):785-95.

Egydio de Carvalho C, Tanaka H, Iguchi N, Ventela S, Nojima H, Nishimune Y.

Department of Science for Laboratory Animal Experimentation, Research Institutefor Microbial Diseases, Osaka University, Suita City, Osaka 565-0871, Japan.

Research Support, Non-U.S.

I should get proper sentences only.

How can i identify proper sentences in perl?

I don't want to use any modules without using modules can we do this?

Here is the text:

1: Biol Reprod. 2002 Mar;66(3):785-95.

Molecular cloning and characterization of a complementary DNA encoding sperm tail
protein SHIPPO 1.

Egydio de Carvalho C, Tanaka H, Iguchi N, Ventela S, Nojima H, Nishimune Y.(Author's names)

Department of Science for Laboratory Animal Experimentation, Research Institute for Microbial Diseases, Osaka University, Suita City, Osaka 565-0871, Japan.

Formation of the tail in developing sperm is a complex process involving the organization of the axoneme, transport of periaxonemal proteins from the
cytoplasm to the tail, and assembly of the outer dense fibers and fibrous sheath.Although detailed morphological descriptions of these events are available, the molecular mechanisms remain to be fully elucidated. We have isolated a new gene, named shippo 1, from a haploid germ cell-specific cDNA library of mouse testis,and also its human orthologue (h-shippo 1). The isolated cDNA is 1.2 kilobases long, carrying a 762-base pair open reading frame that encodes SHIPPO 1, a sperm protein predicted to consist of 254 amino acids. The amino acid sequence includes 6 Pro-Gly-Pro repeats, which are also present in the human orthologue protein (hSHIPPO 1) as well as in 2 other newly reported proteins of Drosophila melanogaster. Transcription of shippo 1 is exclusively observed in haploid germ cells. Antibody raised against SHIPPO 1 identified a testis-specific M(r) 32 x 10(-3) band in Western blot analysis. The protein was further localized in the flagella of the elongated spermatids and along the entire length of the tail in mature sperm. SHIPPO 1 in sperm is resistant to treatment with nonionic detergents and coextracted with the cytoskeletal core proteins of the mouse sperm tail.

Publication Types:
Research Support

ID:1187

Pls tell me how to identify senetences?

with regards
Vanitha

If you have to do a lot of these, you are in trouble IMO.

Finding sentences vs scientific citations requires some sort of AI. You would have to identify a block of text ending in . that has a subject and a predicate. Either thsat or create some sort of monstrous filter that traps every single journal and author name.
It would be easier to simply edit the file by hand.

Hi,

Thanks for the reply.
Otherwise no other way!!