zorrox
April 30, 2011, 11:40am
1
Hi. I have input like this:
<tr>
<td class="logo1" rowspan="2"><a href="index.html"><img
src="images/logo.png" /></a></td>
<td class="pad1" rowspan="2">__</td>
<td class="userBox"><img src="images/person.png"/> <a href="http://good.mybook.com/login.jsp">Sign In</a></td>
<td class="searchBox"><a href="http://good.mybook.com/searchhelp.jsp">Search mybook</a>
<input type="text" size="14" id="searchTextBox"
onkeypress="if(event.keyCode == 13) { search(); return false; }" />
<input type="button" value="Go" onclick="search();" /></td>
</tr>
<tr>
<td>�</td>
<td class="bentyLogo"><a
href="http://www.comp.benty.ac.uk/parsic"><img
src="images/benty.png" /></a></td>
</tr>
<tr>
<td class="logoPad">__</td>
<td class="pad2">__</td>
<td class="title">
<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
</td>
<td class="bentyText"><a
href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
</tr>
I would like to delete the whole thing from <tr to </tr> for the first and third "tr" tag pairs. The problem is I have the same pattern of <tr./tr> all over (not just the ones in the input sample) and I do not want to delete them all so I can not simply use sed s/<tr. \/tr>//g. So I thought maybe try sed s/<tr.*pad2.*\/tr>//g which obviously will not work. I have been searching google and trying to solve this all day but to no avail. Please help. Thanks.
How is your input file look like? Does it have only these <tr></tr> data?
If so, is this your required output?
<tr>
<td>�</td>
<td class="bentyLogo"><a href="http://www.comp.benty.ac.uk/parsic">
<img src="http://linux.unix.com/images/benty.png" /></a>
</td>
</tr>
regards,
Ahamed
zorrox
April 30, 2011, 11:59am
3
Thank you for that quick reply brother.
Yes, if it based on just that input sample. Unfortunately there are a lot more "tr" pairs in the original input. By the way, how would you do it if it based just on the input sample?
Thanks. Salam.
I try to write for this issue..There are maybe some bugs but you can test your inputfile.I hope it is helpful for you..
# cat file
<tr>
<td class="logo1" rowspan="2"><a href="index.html"><img
src="images/logo.png" /></a></td>
<td class="pad1" rowspan="2">__</td>
<td class="userBox"><img src="images/person.png"/> <a href="bookarmy In</a></td>
<td class="searchBox"><a href="bookarmy mybook</a>
<input type="text" size="14" id="searchTextBox"
onkeypress="if(event.keyCode == 13) { search(); return false; }" />
<input type="button" value="Go" onclick="search();" /></td>
</tr>
<tr>
<td>�</td>
<td class="bentyLogo"><a
href="http://www.comp.benty.ac.uk/parsic"><img
src="images/benty.png" /></a></td>
</tr>
<tr>
<td class="logoPad">__</td>
<td class="pad2">__</td>
<td class="title">
<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
</td>
<td class="bentyText"><a
href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
</tr>
<tr>
<td class="logoPad">__</td>
<td class="pad2">__</td>
<td class="title">
<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
</td>
<td class="bentyText"><a
href="http://www.comp.benty.ac.uk/parsic">Mona Research GroupXXX</a></td>
</tr>
#./justdoit 1,3 tr file
<tr>
<td>�</td>
<td class="bentyLogo"><a
href="http://www.comp.benty.ac.uk/parsic"><img
<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
</td>
<td class="bentyText"><a
href="http://www.comp.benty.ac.uk/parsic">Mona Research Group</a></td>
</tr>
<tr>
<td class="logoPad">__</td>
<td class="pad2">__</td>
<td class="title">
<h2>Cat-by-Cat mybook - Cook (56:47)</h2>
</td>
<td class="bentyText"><a
href="http://www.comp.benty.ac.uk/parsic">Mona Research GroupXXX</a></td>
</tr>
#!/bin/bash
## justdoit SED scripts##
if [ $# != 3 ] ; then
echo "Usage $0 tagpairs_nr[1 and 3] tagname[tr,pre,table..] inputfile
eg.. '$0 1,3 tr index.html'" ; exit 1
fi
removes=$1
if [ ! $(echo "$removes"|sed -n '/[1-2],[3-6]/p') ] ; then
echo "Tagpair numbers must as [1-2],[3-6] format" ; exit 1
fi
tag=$2
f=$(echo $removes|sed 's/\([0-9]\),[0-9]/\1/')
l=$(echo $removes|sed 's/[0-9],\([0-9]\)/\1/')
if (( $f >= 3 )) ; then
echo "Script works only as first value in [1 and 2] digit" ; exit 1
fi
file=$3
cp $file "${file}bck" ; if [ $? -ne 0 ] ; then
echo "Backup infile process is unsuccess" ; exit 1
fi
case $f in
1) start=4 ;;
2) start=2 ;;
esac
case $l in
3) scale=6 ;;
4) [ $f = 1 ] && scale=(2 6) || scale=4 ;;
5) [ $f = 1 ] && scale=(2 2 6) || scale=(4 2 4) ;;
6) [ $f = 1 ] && scale=(2 2 2 6) || scale=(4 2 2 4) ;;
esac
taglines=$(sed -n '/[/]*'$tag'/=' $file|sed -n '$=')
tagpairlines=$(sed -n '/[/]*'$tag'/=' $file|sed '$!N;s/\n/,/'|sed -n '$=')
calv=$(echo $l | awk '{print '$taglines' / $1 }')
[ $(echo $calv | sed -n '/[0-9]\.[0-9]*$/p') ] && calv=$(echo ${calv%.*}) && calv=$(( calv + 1))
sedarrix=$(( tagpairlines - calv ))
x=0 ; limit=${#scale[@]}
pattern=("$start s/,*//;")
second=start
while [ $(( sedarrix -=1 )) -gt -1 ] ; do
resume=${scale[x]}
second=$(( second + resume ))
pattern=(${pattern[@]} "$second s/.*//;")
((x++)) ; [ $x = $limit ] && x=0
done
fullsed="sed -n '/[/]*'$tag'/=' $file | sed '$!N;s/\n/,/; ${pattern[@]} '"
fullsedarr=($(eval $fullsed |sed ':a;$!N;s/\n/ /;ta') ) ;
x=0 ; fullsedarrix=${#fullsedarr[@]}
removepairs=${fullsedarr[x]}
# open file and sed processing.
while [ $(( fullsedarrix -=1 )) -gt 1 ] ; do
sed ' '$removepairs' d' $file >${file}tmp && mv ${file}tmp $file
((x++)) ; correctsedv=$(($(echo ${fullsedarr[x]}|sed 's/\([0-9]*\),\([0-9]*\)$/\2-\1+1/')))
fullcorrectsedv=$((fullcorrectsedv+correctsedv))
corrv=($(echo ${fullsedarr[x]}|sed 's/\([0-9]*\),\([0-9]*\)$/\1-'$fullcorrectsedv' \2-'$fullcorrectsedv'/'))
y=0 ; for i in ${corrv[@]} ; do reorg[y]=$(($i)); ((y++)) ; done
removepairs=$(echo ${reorg[@]}|sed 's/ /,/' )
done
more $file
regards
ygemici
Try this...
#to delete 1 and 3rd <tr>...</tr> pairs
awk '/<tr>/ {i=i+1} {if(i==1 || i==3){next} print}' file
if further if you want to delete say only the second pair, tweak the above code as
#to delete 2nd <tr>...</tr> pair
awk '/<tr>/ {i=i+1} {if(i==2){next} print}' file
regards,
Ahamed
1 Like
Your code is long ygemici. Thanks.
---------- Post updated at 08:57 AM ---------- Previous update was at 08:47 AM ----------
Thank you Mr Ahamed. Your code is much better.