I am cleaning up HTML with sed. With the regexp
<a name="[A-Za-z0-9 ?.]+"></a><h[123]>[ ]*<span class="mw-headline" >[A-Za-z0-9 ?.]+</span></h[123]>
I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now:
sed -r -e s/"<a name=\"/replacement/ <in >out
This works. But when I enter:
sed -r -e s/"<a name=\"[A-Za-z0-9 ?_.]+"/replacement/ <in >out
it fails with:
sed: can't read <in: Invalid argument
sed: can't read >out: Invalid argument
But the in file is really there. How can I get the regexp in the sed command? I have tried escaping/not escaping chars, but sed does not seem to accept it.
Can you provide the ouput you desire?
Regards
From a tag like this:
<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1>
I'd like to make:
<a name="Introduction"></a><h1><span class="mw-headline" id="Introduction" >Introduction</span></h1>
Therefore I do the following replacement:
Match:
<a name="([A-Za-z0-9 ?.]+)"></a><h([123])>[^mw]*mw-headline" >([A-Za-z0-9 ?.]+)</span></h[123]>
And replace it with:
<a name="\1"></a><h\2><span class="mw-headline" id="\1" >\3</span></h\2>
This works when using a find and replace editor which accepts regex. But I can't seem to fit it in one sed command.
Something like:
echo '<a name="Introduction"></a><h1><span class="mw-headline" >Introduction</span></h1>'|
sed 's/\(.*"\)\(.*\)/\1 id="Introduction" \2/'
Regards