Need Help with GREP REGEX scripts for common BB-EDIT text-editing

TheMacGuy · March 17, 2012, 3:44am

Hi Everybody..

I'm a "newbie" to using Command-line... A few half-remembered DOS commands from 30 years ago, and the very handy "Sudo rm -R pathname" REMOVE command...

I do a lot of "cleaning" of plain-text OCR text files. with assorted common
line-break, punctuation and capitalization errors..

IF there's a "recipe book" of simple GREP commands that are "obvious how to use" (for a newbie... I'd love to see it!!! (haven't found it YET!)

Meanwhile, here's the part that's giving me a migraine.. Help, Please!

-----------

I'm having trouble figuring out how to "clean" a text-file of extraneous formatting problems using GREP commands in my "BB-EDIT" (Macintosh) text-program;

I'm trying to clean out a pair of carriage-returns in between a "broken" paragraph (lowercase letter ending para.1, and lowercase letter starting para.2 with NO PUNCTUATION in between, just two line-break \r characters...

My ATTEMPT isn't quite working.
I'm trying to use the GREP command [a-z]\r{2}

to replace the two line-breaks between paragraph1 and paragraph2, (that is, the anchorpoint is the LAST l/c character of the FIRST part of the broken-paragraph)
-----without affecting the end or start letters of the two paragraphs....

THIS GREP STRING **IS** finding the EXACTLY TWO carriage returns PRECEEDED BY a l.c. letter [a-z]

But it is *NOT* "remembering" that PRECEEDING lower-case letter....

So, "Mary had a little lamb

who had snowy green fleece....

is being replaced with "Mary had a little lam who had snowy green fleece.....

Does anybody have such a GREP pattern (and a simple explanation of it, if possible!) that will find [a-z]\r\r[a-z] and REPLACE the two carriage returns with a single-space----WITHOUT affecting the two lowercase letters at the end of paragraph1 and beginning of paragraph2

Any Ideas how I can fix this??? Please advise!! Thank-you!!

TRY THIS TOO:

Pattern Matches
(p) the pattern p and remembers it
(?P<NAME>p) the pattern p and remembers it by the specified string NAME

So, if I'm reading this correctly, modelling from my "broken" expression above it should be:
Find: ([a-z]\r{2})
Replace [a-z\r{2}]

---Nope, that doesn't work (for me) either.... Somethings' wrong here, but what???

methyl · March 21, 2012, 9:43am

I don't think that the grep command is going to help because grep never changes the file.
You will need a sed command to read your input file, edit the data and produce a new corrected output file.

Best suggestion is to re-post in Shell Scripting after condensing your post to the basics.
The normal line terminator for text files in MACOS depends on the Operating System version. Please post the exact version of your O/S and state what is the normal line terminator in a text file for your version of MACOS. Please post an example input file and the expected output and explain the process concisely.

The output from this (coincidentally) sed command on a representative sample portion of the input file should clear up any ambiguities about the text file format and make the extraneous characters visible.

sed -n l filename