how to get tags content by grep

1) Is it possible to get tags content by grep -E ? For example title. Source text "<title>My page<title>"; to print "My page".

2) which bash utility to use when I want to use regex in this format?
(?<=title>).*(?=</title)

Perl.

perl -nle 'print $& if /(?<=title>).*(?=<\/title)/' file

grep will not work across lines, so HTML tags that cross multiple lines of data won't match. Neither will other line-based tools like sed.

For a problem like this I'd use awk. It has powerful regexes like sed and grep's, but is an actual programming language where you get to pick exactly what gets printed when, remember things with variables, etc.

$ echo -e "<title>stuff\na\nb\nc</title>" |
awk -v RS="<" '
        /^title>/ { sub(/^title>/, "", $0); P=1 }
        /^\/title>/ { P=0 }
        P'
stuff
a
b
c

$

Nice. Do you think I could use it with gnuwin32? I just downloaded GnuWin perl and there are pcregrep.exe and pcretest.exe. I would like to run it on Win XP.

You should run these things in a bash/ksh/zsh shell or what have you. Windows CMD has awful quoting problems -- quoting is more or less left as a problem for the utility itself, not something CMD does -- which means every utility seems to handle quoting slightly differently. Sometimes there's just no way to control when an argument gets split or passed raw.

Which makes it extremely difficult to pass a regular expression into any program inside single quotes.

If you can install awk and bash in gnuwin32, I don't see why it wouldn't work.

For situations like this:

perl -ln0e '$,="\n";print /(?<=<title>).*?(?=<\/title)/sg' file

Appears to work.

What do the commandline options actually mean? 'man perl' helpfully tells me they're not documented in 'man perl' but doesn't say where they are documented...

man perlrun

BTW my man perl said where to find those options (in Reference Manual section):

           perlrun             Perl execution and options

:smiley:

Yet I have a problem that I don't know how to process the data from file to perl command. So in cmd interpreter I tried this:

for /f "delims=" %a in ('dir /b *.a') do (
pcretest.exe -ln0e '$,="\n";print /(?<=<title>).*?(?=<\/title)/sg' < "%a"
)

It tells me that < was not expected on this place.... Is it OK here or should I ask rather in DOS forum?

Edit: the testing file a.a contains some html text. e.g.:
something.txt
<title>Hello title</title>
balbalaba

I would strongly suggest you to install some Linux distribution (for example in VirtualBox) and do your pattern matching there.

And what about grep? Can it do it for one line? Because grep is my favourite tool but don't know if it can filter out text that is on the line. So just the "Hello title" would stay.

You're hitting the exact problem I just explained to you: quoting in CMD is a horrid botch. If you can install an actual shell to use in your system you'll have a better chance in it.