extracting Line between HTML tag

Hi everyone:
I want to extract string which is in between certain html tag.
e.g.

I tried with grep,cut, awk but could not find exact syntax for this one. :wall:

PS>Sorry about bad english.

Have a go with:

sed -n 's/.*<tag>//; T; s/<\/tag>.*//; T; p' input-file >output-file

This assumes both opening and closing tags do not have a newline between them.

Or even with newlines:

awk -F\> '/^tag>/{print $2}' RS=\< infile

and if you also want to eliminate them:

awk -F\> '/^tag>/{gsub(ORS,x);print $2}' RS=\< infile

With varying tag:

awk -F\> '$0~"^"t">" {gsub(ORS,x);print $2}' RS=\< t="tag" infile

@agama note: T is GNU sed only

1 Like

Hi I Got

I tried man sed could not find

Am I missing something?

sed 's/<[^>]*>/ /g'

or

grep -Po '(?<=>)\w+(?=<)'

1st Thanks to huaihaizi3 ,agama for quick responds.

Worked!!! I been trying to solve this issue for 2 hours but you did in 10 min.

Between can you care to explain code. I am hitting man awk, could not find appropriate answers.

1 Like

Note: I edited the code in my post...

Yes, I seem to always forget that. A BSD sed just for completeness:

sed -n 's/.*<tag>//; !t 
s/<\/tag>.*//; !t 
p'

Newlines required.