Replacing pattern spanning multiple lines

Hi. I have input like this:

	<tr>
		<td class="logo1" rowspan="2"><a href="index.html"><img
			src="images/logo.png" /></a></td>
		<td class="pad1" rowspan="2">__</td>
		<td class="userBox"><img src="images/person.png"/> <a href="http://good.mybook.com/login.jsp">Sign In</a></td>
		<td class="searchBox"><a href="http://good.mybook.com/searchhelp.jsp">Search mybook</a>
		<input type="text" size="14" id="searchTextBox"
			onkeypress="if(event.keyCode == 13) { search(); return false; }" />
		<input type="button" value="Go" onclick="search();" /></td>
	</tr>
	<tr>
		<td>�</td>
		<td class="bentyLogo"><a
			href="http://www.comp.benty.ac.uk/parsic"><img
			src="images/benty.png" /></a></td>
	</tr>
	<tr>
		<td class="logoPad">__</td>
		<td class="pad2">__</td>
		<td class="title">
		<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
		</td>
		<td class="bentyText"><a
			href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
	</tr>

I would like to delete the whole thing from <tr to </tr> for the first and third "tr" tag pairs. The problem is I have the same pattern of <tr./tr> all over (not just the ones in the input sample) and I do not want to delete them all so I can not simply use sed s/<tr.\/tr>//g. So I thought maybe try sed s/<tr.*pad2.*\/tr>//g which obviously will not work. I have been searching google and trying to solve this all day but to no avail. Please help. Thanks.

How is your input file look like? Does it have only these <tr></tr> data?

If so, is this your required output?

<tr>         
<td>�</td>         
<td class="bentyLogo"><a href="http://www.comp.benty.ac.uk/parsic">
<img src="http://linux.unix.com/images/benty.png" /></a>
</td>    
</tr>

regards,
Ahamed

Thank you for that quick reply brother.
Yes, if it based on just that input sample. Unfortunately there are a lot more "tr" pairs in the original input. By the way, how would you do it if it based just on the input sample?

Thanks. Salam.

I try to write for this issue..There are maybe some bugs but you can test your inputfile.I hope it is helpful for you..:wink:

# cat file
<tr>
                <td class="logo1" rowspan="2"><a href="index.html"><img
                        src="images/logo.png" /></a></td>
                <td class="pad1" rowspan="2">__</td>
                <td class="userBox"><img src="images/person.png"/> <a href="bookarmy In</a></td>
                <td class="searchBox"><a href="bookarmy mybook</a>
                <input type="text" size="14" id="searchTextBox"
                        onkeypress="if(event.keyCode == 13) { search(); return false; }" />
                <input type="button" value="Go" onclick="search();" /></td>
        </tr>
        <tr>
                <td>�</td>
                <td class="bentyLogo"><a
                        href="http://www.comp.benty.ac.uk/parsic"><img
                        src="images/benty.png" /></a></td>
        </tr>
        <tr>
                <td class="logoPad">__</td>
                <td class="pad2">__</td>
                <td class="title">
                <h2>Cat-by-Cat mybook - Cook (56:47)</h2>
                </td>
                <td class="bentyText"><a
                        href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
        </tr>
        <tr>
                <td class="logoPad">__</td>
                <td class="pad2">__</td>
                <td class="title">
                <h2>Cat-by-Cat mybook - Cook (56:47)</h2>
                </td>
                <td class="bentyText"><a
                        href="http://www.comp.benty.ac.uk/parsic">Mona Research GroupXXX</a></td>
        </tr>
#./justdoit 1,3 tr file
       <tr>
               <td>�</td>
               <td class="bentyLogo"><a
                       href="http://www.comp.benty.ac.uk/parsic"><img
               <h2>Cat-by-Cat mybook - Cook (56:47)</h2>
               </td>
               <td class="bentyText"><a
                       href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
       </tr>
       <tr>
               <td class="logoPad">__</td>
               <td class="pad2">__</td>
               <td class="title">
               <h2>Cat-by-Cat mybook - Cook (56:47)</h2>
               </td>
               <td class="bentyText"><a
                       href="http://www.comp.benty.ac.uk/parsic">Mona Research GroupXXX</a></td>
       </tr>
#!/bin/bash
## justdoit SED scripts##
 
if [ $# != 3 ]  ; then
echo "Usage $0 tagpairs_nr[1 and 3] tagname[tr,pre,table..] inputfile
eg.. '$0 1,3 tr index.html'" ; exit 1
fi
 
removes=$1
if [ ! $(echo "$removes"|sed -n '/[1-2],[3-6]/p') ] ; then
echo "Tagpair numbers must as [1-2],[3-6] format" ; exit 1
fi
 
tag=$2
f=$(echo $removes|sed 's/\([0-9]\),[0-9]/\1/')
l=$(echo $removes|sed 's/[0-9],\([0-9]\)/\1/')
 
if (( $f >= 3 )) ; then
echo "Script works only as first value in [1 and 2] digit" ; exit 1
fi
 
file=$3
cp $file "${file}bck" ; if [ $? -ne 0 ] ; then
echo "Backup infile process is unsuccess" ; exit 1
fi
 
case $f in
  1) start=4 ;;
  2) start=2 ;;
esac
 
case $l in
  3) scale=6 ;;
  4) [ $f = 1 ] && scale=(2 6) || scale=4 ;;
  5) [ $f = 1 ] && scale=(2 2 6) || scale=(4 2 4) ;;
  6) [ $f = 1 ] && scale=(2 2 2 6) || scale=(4 2 2 4) ;;
esac
 
 
taglines=$(sed -n '/[/]*'$tag'/=' $file|sed -n '$=')
tagpairlines=$(sed -n '/[/]*'$tag'/=' $file|sed '$!N;s/\n/,/'|sed -n '$=')
calv=$(echo $l | awk '{print '$taglines' / $1 }')
[ $(echo $calv | sed -n '/[0-9]\.[0-9]*$/p') ] && calv=$(echo ${calv%.*}) && calv=$(( calv + 1))
sedarrix=$(( tagpairlines - calv ))
 
x=0 ; limit=${#scale[@]}
pattern=("$start s/,*//;")
second=start
while [ $(( sedarrix -=1 )) -gt -1 ] ; do
resume=${scale[x]}
second=$(( second  + resume ))
pattern=(${pattern[@]} "$second s/.*//;")
((x++)) ; [ $x = $limit ] && x=0
done
 
fullsed="sed -n '/[/]*'$tag'/=' $file | sed '$!N;s/\n/,/; ${pattern[@]} '"
fullsedarr=($(eval $fullsed |sed ':a;$!N;s/\n/ /;ta') ) ;
x=0 ; fullsedarrix=${#fullsedarr[@]}
removepairs=${fullsedarr[x]}
 
# open file and sed processing.
while [ $(( fullsedarrix -=1 )) -gt 1 ] ; do
sed ' '$removepairs' d' $file >${file}tmp && mv ${file}tmp $file
((x++)) ; correctsedv=$(($(echo ${fullsedarr[x]}|sed 's/\([0-9]*\),\([0-9]*\)$/\2-\1+1/')))
fullcorrectsedv=$((fullcorrectsedv+correctsedv))
corrv=($(echo ${fullsedarr[x]}|sed 's/\([0-9]*\),\([0-9]*\)$/\1-'$fullcorrectsedv' \2-'$fullcorrectsedv'/'))
y=0 ; for i in ${corrv[@]} ; do reorg[y]=$(($i)); ((y++)) ; done
removepairs=$(echo ${reorg[@]}|sed 's/ /,/' )
done
more $file

regards
ygemici

Try this...

#to delete 1 and 3rd <tr>...</tr> pairs
awk '/<tr>/ {i=i+1}  {if(i==1 || i==3){next} print}' file

if further if you want to delete say only the second pair, tweak the above code as

#to delete 2nd <tr>...</tr> pair
awk '/<tr>/ {i=i+1}  {if(i==2){next} print}' file

regards,
Ahamed

1 Like

Your code is long ygemici. Thanks.

---------- Post updated at 08:57 AM ---------- Previous update was at 08:47 AM ----------

Thank you Mr Ahamed. Your code is much better.