Extract all proper names from string with awk

I want to extract the proper names with awk from a very long string, like:

�(k): </span><br /><a something="pls/pe/person.person?i_pers_id=3694&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank"><b>Gary  Oldman</b></a> (George Smiley)<br /><a something="/pls/pe/person.person?i_pers_id=9384&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank"><b>Colin  Firth</b></a> (Bill Haydon)<br /><a something="pls/pe/person.person?i_pers_id=209372&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank"><b>Tom  Hardy</b></a> (Ricki Tarr)<br /><a something="/pls/pe/person.person?i_pers_id=10808&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">John  Hurt</a> (Control)<br /><a something="/pls/pe/person.person?i_pers_id=167105&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Toby  Jones</a> (Percy Alleline)<br /><a something="/pls/pe/person.person?i_pers_id=24870&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Mark  Strong</a> (Jim Prideaux)<br /><a something="/pls/pe/person.person?i_pers_id=219080&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Benedict  Cumberbatch</a> (Peter Guillam)<br /><a something="/pls/pe/person.person?i_pers_id=108042&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Ciar�n  Hinds</a> (Roy Bland)<br /><a something="/pls/pe/person.person?i_pers_id=222906&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">David  Dencik</a> (Toby Esterhase)<br /><br />szinkronhang: <br /><a something="/pls/pe/person.person?i_pers_id=3880&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Heged�s D. G�za</a> (George Smiley magyar hangja)<br /><a something="/pls/pe/person.person?i_pers_id=22939&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Csank� Zolt�n</a> (Bill Haydon magyar hangja)<br /><a something="/pls/pe/person.person?i_pers_id=25860&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Viczi�n Ott�</a> (Ricki Tarr magyar hangja)<br /><a something="/pls/pe/person.person?i_pers_id=13098&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Tordy G�za</a> (Control magyar hangja)<br />
<a something="/pls/pe/person.person?i_pers_id=7028&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Gyabronka J�zsef</a> (Percy Alleline magyar hangja)<br /><a something="/pls/pe/person.person?i_pers_id=6444&i_topic_id=2&i_city_id=3372&i_county_id=-1" target="_blank">Sz�les Tam�s</a> (Jim Prideaux magyar hangja)</span><br />

The output I want:
Gary Oldman
(George Smiley)
Colin Firth
(Bill Haydon)
Tom Hardy
(Ricki Tarr)
John Hurt
(Control)
Toby Jones
(Percy Alleline)
Mark Strong
(Jim Prideaux)
etc.
Thanks

---------- Post updated at 04:04 PM ---------- Previous update was at 03:55 PM ----------

My mistake: It's a one-line only string, and I changed "href" to "something", but I think it doesn't matter.

Try:

perl -ln0e 'while (/>([^<]+)/g) {print "$1"}' file

Unfortunately it doesn't work:
syntax error at -e line 1, near ") ("
Execution of -e aborted due to compilation errors.

Did you copy&paste it to the terminal? What operating system are you using?

No, I typed it, but without mistake. I checked.
It's a Debian GNU/Linux 6.0.

This is interesting, as in my code there is no string ") (" - as reported in the error. Can you try copy and pasting this code to the terminal window?

Oh, sorry!
Now I see the mistype error - dumb ( - {.
So it works almost perfectly, but I need one space between the names, like

Gary Oldman

not

Gary  Oldman

.
Thanks

I don't see any difference here:

Please use code tags to preserve white spaces.

Meanwhile I edited my reply.

perl -ln0e 'while (/>([^<]+)/g) {$x=$1;$x=~s/ +/ /;print "$x"}' file
1 Like

bartus11, It's works perfectly, thank you!

---------- Post updated at 04:55 PM ---------- Previous update was at 04:54 PM ----------

Anyone knows it in awk?

Something like this?

awk '$0=$2' RS=\< FS=\> infile
1 Like

Yes, it's working too, thank you!