Hi,
I have OCR'ed text that needs cleaning.
Lines are delimited by parts of speech (POS), for example,
each line will have either an
adj. OR s. f. OR s. m. etc
I need to uppercase all text before the POS
but all text within parentheses to be lowercase
Text after (and including) the POS to remain as is
filename: munge
fuiASSO, FIEIASSO (b.), fuluasso (a. l.), foulhasso (for.), (b. lat. folz�acia), s. f. grosse feuille,
FUMFULHUT (l.), felhut (g.), FOULhuolhut, (it.) FOGLIUTO, adj. Feuillu, ue, v. uiaru, pampous,
FUIEMT, fuiret (rh.), fulheiret, ramoner (l.), fulhoret (rouerg.), s. m. Feuilleret, petit rabot qui sert faire des feuillures.
FULmjnacioun, FULMINACIEN (m.), fulminacieu (l.), (rom. lat. fulminatzo, cat. fulminaci�, esp. fulminacion, it. fwlminasione), s. f. Fulmination, v. trounado.
FULMINANT, ANTO (port. fulminante), adj. Fulminant, ante, v. trounant. R. fulmana.
I have uppercased everything before POS with
sed -r -i -f doup.sed munge
doup.sed
s/ n. de l. /^ n. de l. /
s/ s. m. /^ s. m. /
s/ s. f. /^ s. f. /
s/ adj. /^ adj. /
s/ n. p. /^ n. p. /
s/ v. n. /^ v. n. /
s/ v. a. /^ v. a. /
s/ adv. /^ adv. /
s/^(.*)\^/\U\1\E/
and tried to lowercase between the parentheses with
sed -r -i 's/\((.*)\)/\L&/g' munge
but this retains uppercaseing until first parentheses and lowercases everything else up the POS like:
FUIASSO, FIEIASSO (b.), fuluasso (a. l.), foulhasso (for.), (b. lat. folz�acia), s. f. grosse feuille,
etc
etc
Any GNU sed 4.2.2 or GAWK 4.1.3 solutions please
Thanks in advance