Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line.

For e.g. in below as input
-------------------------------------------------------------------------------------

mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)

-------------------------------------------------------------------------------------

I need output to be
-------------------------------------------------------------------------------------

hussy donald ryan johnson

-------------------------------------------------------------------------------------

condition 1 : Extract output between paranthesis.
condition 2 : number of patterns match is unknown.

I tried below command using awk, but it extracts only first pattern match.

awk -F'[(|)]' '{print $2}' filename
$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | perl -ne '@surnames=$_=~/\(([^)]+)\)/g;print join"\n",@surnames' -
hussey
donald
ryan
johnson
1 Like

awk solution (can be made better)

awk '{for(i=1;i<=NF;i++) {if($i~/\(/) a=a" "substr($i,1+index($i,"("))} }{gsub(/\)/,"",a);print a;a=""}' infile
1 Like

To get the other fields (and print the values from each input line on a single output line) try:

awk -F '[()]' '
{       for(i = 2; i < NF; i += 2)
                printf("%s%s", $i, i == NF - 1 ? "\n" : " ")
}' filename

Note that (even though it doesn't matter with the sample input you provided) you don't want the pipe symbol in the list of characters to be treated as field separators in your -F option option-argument.

1 Like

Try

$ echo "mike(hussey) AND mike(donald) AND mike(ryan) ANDa mike(johnson)" | awk '{gsub("[A-Za-z]*[(]|[)]*[A-Za-z]*[ ]|[)]"," ")}1'

Resulting

 hussey   donald   ryan   johnson 

OR

If spacing is important then

$ echo "mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)" | awk '{gsub("[A-Za-z]*[(]","");gsub("[)]*[A-Za-z]*[ ]|[)]"," ")}1'

Resulting

hussey  donald  ryan  johnson

Hi,
A sed solution:

$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | sed '/\(^[^(]*(\|)[^(]*(\|)[^(]*\)/s// /g;s/ //'
hussey donald ryan johnson

Another awk solution:

$ echo 'mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)' | awk -F'[()]' '{while((i+=2)<NF) {x=x T $i;T=" "} print x}'
hussey donald ryan johnson

Regards.

Hi all ..
thank you very much for prompt reply..

@krishmaths - your answer suits my requirement perfectly
@skrynesaver- perl also works but it shows output in separate line.
@Don cragon - It works only when number of columns is known

---------- Post updated at 03:53 PM ---------- Previous update was at 03:52 PM ----------

thanks Akshay.. works perfectly

@sameermohite

edit post #1 and use codetag. Don't leave this work for moderators as they are busy in helping several people like you and me.

I assume by @Don cragon you mean me. I don't understand your comment. In the tests I ran, it works on any input line where there are one or more matched pairs of opening and closing parentheses. For example, with the input:

mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)(11)
nothing to print from this line
lots of junk (real data) more junk

it prints:

hussey donald ryan johnson
1 2 3 4 5 6 7 8 9 10 11
real data

which seems to match what you requested. You didn't say what should happen when there are no parentheses.