Hi everybody,
I am trying to remove bunch of lines from web pages between two tags:
one is <h1> and the other is <table
it looks like
<h1>Anniversary cards roses</h1>
many
lines here
<table summary="Free anniversary greeting cards." cellspacing="8" cellpadding="8" width="70%">
my goal is to delete all including <h1> but keep untouched
<table summary="Free anniversary greeting cards" cellspacing="8" cellpadding="8" width="70%">
any help is greatly appreciated.
perl -lp0e 's/<h1>.*<table/<table/s' infile > outfile
1 Like
awk 'BEGIN{ok=1}
/^<h1>/ {ok=0}
/<^table summary="Free anniversary greeting cards" cellspacing="8" cellpadding="8" width="70%"> {ok=1}
ok==1 {print}
ok==0 {next} ' inputfile > outputfile
This clobbers everything including the FIRST <hl> tag onward. It stops clobbering at the exact table summary statement you gave.
1 Like
bartus11, your command works great.
perl -lp0e 's/<h1>.*<table/<table/s' infile > outfile
but, would please advise how to make all changes in place and create backup file.
thank you very much.
jim, when i ran your code it came up with an error message
awk: cmd. line:3: ^ unterminated regexp
but i also thank you for your time and effort.
perl -i.bak -lp0e 's/<h1>.*<table/<table/s' infile
1 Like