how to delete certain java script from html files using sed

I am cleaning forum posts to convert them in offline reading version with clean html text. All files are with html extension and reside in one folder. There is some java script i would like to remove, which looks like

<script LANGUAGE="JavaScript1.1">
<!--

function mMz()
{
 var mPz = "";
 for(var prop in this) {
 if ((prop.charAt(0) == '_' && prop.charAt(prop.length-1)=='_')
		      || ((typeof this[prop]) == 'function')) 
 continue;
 if (mPz != "") mPz += '&';
 mPz += prop + ':' + escape(this[prop]);
 }
 var cookie = this.gHa + '=' + mPz;
 if (this._expiration_)
 cookie += '; expires=' + this._expiration_.toGMTString();
 if (this._path_) cookie += '; path=' + this._path_;
 if (this._domain_) cookie += '; domain=' + this._domain_;
 if (this._secure_) cookie += '; secure';
 
 this._document_.cookie = cookie;
}
//-->
</script>

I tried using following sed
command

sed -i.bak '/\<script LANGUAGE="JavaScript\1\.\1"\>/,/<\/script\>/d' *.html

but it results in error

sed: -e expression #1, char 40: Invalid back reference

Any help will be greatly appreciated

Welcome to the forum.

First off, I never reccomend editing in-place, but you've had the wisdom to use backup files so that's good.

You don't need to escape the numbers, which is probably why it's complaining about backreferences. Don't think you need to escape <> either.

Another trick is you can put characters inside [] to 'escape' them, since sed won't treat most special characters in a range block as special. In some situations this is easier to read.

This seems to work.

$ cat data

BEFORE
<script LANGUAGE="JavaScript1.1">
<!--

function mMz()
{
 var mPz = "";
 for(var prop in this) {
 if ((prop.charAt(0) == '_' && prop.charAt(prop.length-1)=='_')
                      || ((typeof this[prop]) == 'function'))
 continue;
 if (mPz != "") mPz += '&';
 mPz += prop + ':' + escape(this[prop]);
 }
 var cookie = this.gHa + '=' + mPz;
 if (this._expiration_)
 cookie += '; expires=' + this._expiration_.toGMTString();
 if (this._path_) cookie += '; path=' + this._path_;
 if (this._domain_) cookie += '; domain=' + this._domain_;
 if (this._secure_) cookie += '; secure';

 this._document_.cookie = cookie;
}
//-->
</script>
AFTER

$ sed '/<script LANGUAGE="JavaScript1[.]1">/,/<[/]script>/d' data

BEFORE
AFTER

$
1 Like

you are great! thank you so much

would you be also willing to give me some more help? here is the point:
i would like to clear some stuff between particular html tags and leave the tags intact. example

-------------quote
<td bgcolor="#eeeeee"><b><a href="Spiritual Treasures - Kriya Yoga download">vandool</a></b> </td>
--------------unquote

i would like to leave only

<td bgcolor="#eeeeee"></b> </td>

there might be up to several such occurrences in a page.

best regards