Extracting strings surrounded by parentheses and seperate by commas

kpfeif · January 16, 2009, 9:55pm

Excuse the terrible title.

I have a text file of 1..n lines, each one containing at least one string between parentheses. Within each string, there is one or more strings separated by commas. I need to extract each string, thus:

input file:

(THIS,THAT)
(THE,OTHER)
(THING)
(OR,MAYBE)
(THIS,THING)

Would result in:

THIS
THAT
THE
OTHER
THING
OR
MAYBE
THIS
THING

I'm pulling some stuff over from an IBM mainframe to Unix and I need to do some work on the resulting strings.

awk is it...I just suck at it.

Kindest regards,
Kris

cfajohnson · January 16, 2009, 10:55pm

tr -sc 'a-zA-Z' '\n' < FILE

MarkR · January 17, 2009, 2:39pm

You might want to add something to delete the first line of the resulting file as the tr does a great job but also translates the first bracket into a blank line before your data so you end up with an additional line.

tr -sc 'a-zA-Z' '\n' < FILE

THIS
THAT
THE
OTHER
THING
OR
MAYBE
THIS
THING

sed does the trick:

tr -sc 'a-zA-Z' '\n' < FILE|sed '1d'
THIS
THAT
THE
OTHER
THING
OR
MAYBE
THIS
THING

Rhije · January 17, 2009, 3:06pm

Maybe I am not thinking clearly..

How does tr work? I mean, I tested it, and it clearly works. Maybe I do not understand the arguments for it. The man pages make it seem as though all it will do is replace SET1 with SET2, so how does: tr -sc 'a-zA-Z' '\n' make it just replace () and comas? Seems like it is basically replacing the opposite, (grep -v comes to mind..).

-c, -C, --complement
first complement SET1

-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character

cfajohnson · January 17, 2009, 4:01pm

It replaces every character that is not part of 'a-zA-Z' with a newline.

cfajohnson · January 17, 2009, 4:04pm

If the result is being pulled into a variable, there's no need for sed:

list=$( tr -sc 'a-zA-Z' '\n' < FILE )
list=${list#?}

summer_cherry · January 18, 2009, 9:43pm

sed 's/[()]//g' a.txt | tr "," "\n"