I would like to grab complex html text between lines using variables. I am running Debian and using mksh shell.
Here is the part of the html that I want to extract from. I would like to extract the words 'to love,' and I would like to use the above and below lines as reference points.
<span class="lemma_definition">
to love
</span>
Working script that does not use variables:
#!/bin/sh
URL="perseus.tufts.edu/hopper/morph?l=amo&la=la"
# Working: prints top definition:
wget -q -O- "$URL" | awk '/<span class="lemma_definition">/,/<\/span>/ {{ if (!/>/) {{$1=$1}1; print $0}} }'
NOTES:
(!/>/) = If there is a '>' just ignore.
{$1=$1}1; = Gets rid of spaces in result else it comes out as: ' <several spaces are here> to love'
How can I properly use the variables to make it work like the non-variable code? I've been reading tutorials but have not come across this situation yet.
The order of the three statements determine if the boundaries are included. Here both are excluded.
This is Regular Expression: special charcters need to be escaped in $wIn and $wOut.
The following variant works with plain strings: