remove large portion of web page code between two tags

Hi everybody,

I am trying to remove bunch of lines from web pages between two tags:
one is <h1> and the other is <table

it looks like

<h1>Anniversary cards roses</h1>
many
lines here
<table summary="Free anniversary greeting cards." cellspacing="8" cellpadding="8" width="70%">

my goal is to delete all including <h1> but keep untouched

<table summary="Free anniversary greeting cards" cellspacing="8" cellpadding="8" width="70%">

any help is greatly appreciated.

perl -lp0e 's/<h1>.*<table/<table/s' infile > outfile
1 Like
awk 'BEGIN{ok=1}
       /^<h1>/ {ok=0}
       /<^table summary="Free anniversary greeting cards" cellspacing="8" cellpadding="8" width="70%"> {ok=1}
      ok==1 {print}
      ok==0 {next} '   inputfile > outputfile 

This clobbers everything including the FIRST <hl> tag onward. It stops clobbering at the exact table summary statement you gave.

1 Like

bartus11, your command works great.

perl -lp0e 's/<h1>.*<table/<table/s' infile > outfile

but, would please advise how to make all changes in place and create backup file.

thank you very much.

jim, when i ran your code it came up with an error message

awk: cmd. line:3:  ^ unterminated regexp

but i also thank you for your time and effort.

perl -i.bak -lp0e 's/<h1>.*<table/<table/s' infile
1 Like

thank you again.

really appreciate it