Extract multiple occurance of strings between 2 patterns

sameermohite · October 24, 2013, 4:22am

I need to extract multiple occurance strings between 2 different patterns in given line.

For e.g. in below as input
-------------------------------------------------------------------------------------

mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)

-------------------------------------------------------------------------------------

I need output to be
-------------------------------------------------------------------------------------

hussy donald ryan johnson

-------------------------------------------------------------------------------------

condition 1 : Extract output between paranthesis.
condition 2 : number of patterns match is unknown.

I tried below command using awk, but it extracts only first pattern match.

awk -F'[(|)]' '{print $2}' filename

Skrynesaver · October 24, 2013, 4:30am

$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | perl -ne '@surnames=$_=~/\(([^)]+)\)/g;print join"\n",@surnames' -
hussey
donald
ryan
johnson

krishmaths · October 24, 2013, 5:06am

awk solution (can be made better)

awk '{for(i=1;i<=NF;i++) {if($i~/\(/) a=a" "substr($i,1+index($i,"("))} }{gsub(/\)/,"",a);print a;a=""}' infile

Don_Cragun · October 24, 2013, 5:11am

To get the other fields (and print the values from each input line on a single output line) try:

awk -F '[()]' '
{       for(i = 2; i < NF; i += 2)
                printf("%s%s", $i, i == NF - 1 ? "\n" : " ")
}' filename

Note that (even though it doesn't matter with the sample input you provided) you don't want the pipe symbol in the list of characters to be treated as field separators in your -F option option-argument.

Akshay_Hegde · October 24, 2013, 6:00am

Try

$ echo "mike(hussey) AND mike(donald) AND mike(ryan) ANDa mike(johnson)" | awk '{gsub("[A-Za-z]*[(]|[)]*[A-Za-z]*[ ]|[)]"," ")}1'

Resulting

 hussey   donald   ryan   johnson

OR

If spacing is important then

$ echo "mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)" | awk '{gsub("[A-Za-z]*[(]","");gsub("[)]*[A-Za-z]*[ ]|[)]"," ")}1'

Resulting

hussey  donald  ryan  johnson

disedorgue · October 24, 2013, 6:08am

Hi,
A sed solution:

$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | sed '/\(^[^(]*(\|)[^(]*(\|)[^(]*\)/s// /g;s/ //'
hussey donald ryan johnson

Another awk solution:

$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | awk -F'[()]' '{while((i+=2)<NF) {x=x T $i;T=" "} print x}'
hussey donald ryan johnson

Regards.

sameermohite · October 24, 2013, 6:23am

Hi all ..
thank you very much for prompt reply..

@krishmaths - your answer suits my requirement perfectly
@skrynesaver- perl also works but it shows output in separate line.
@Don cragon - It works only when number of columns is known

---------- Post updated at 03:53 PM ---------- Previous update was at 03:52 PM ----------

thanks Akshay.. works perfectly

Akshay_Hegde · October 24, 2013, 6:29am

@sameermohite

edit post #1 and use codetag. Don't leave this work for moderators as they are busy in helping several people like you and me.

Don_Cragun · October 24, 2013, 6:41am

I assume by @Don cragon you mean me. I don't understand your comment. In the tests I ran, it works on any input line where there are one or more matched pairs of opening and closing parentheses. For example, with the input:

mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)(11)
nothing to print from this line
lots of junk (real data) more junk

it prints:

hussey donald ryan johnson
1 2 3 4 5 6 7 8 9 10 11
real data

which seems to match what you requested. You didn't say what should happen when there are no parentheses.