Replace multiple lines between tags using sed

dollylamb · November 24, 2008, 5:37am

I have a file example.txt with content look like this:

<TAG>
1
2
3
</TAG>

and I use a sed command to replace everything between <TAG></TAG> as below:

sed -e 's/\(<TAG>\)[^<]*\(<.*\)/something/g' example.txt > example.txt.new

But unfortunately, the command failed to replace as i want, it only work if the content between the tags are not break into multi-line. Could someone please explain how to solve this case?

Thanks a lot.

Franklin52 · November 24, 2008, 6:44am

Try this:

awk '/<TAG>/{p=1;print}/<\/TAG>/{p=0}!p' file

dollylamb · November 24, 2008, 7:20am

many thanks Franklin!

Your awk command doesn't replace the content between the tags, but it deletes them. Now, i can use sed command to add new content as expect. Thanks again for your help.

dennis.jacob · November 24, 2008, 7:28am

Try:

sed -n '/<TAG>/,/<\/TAG>/p' < file | sed  '/TAG/d'

Franklin52 · November 24, 2008, 7:31am

To replace the text with something you can try this:

awk '/<TAG>/{p=1;print;print "something"}/<\/TAG>/{p=0}!p' file

dollylamb · November 24, 2008, 7:46am

I really appreciated for your help dennis. But the Franklin's command works very well in my situation. Maybe i'll need your helps for my future's troubles, but i owed you this time

jdv · May 11, 2009, 7:16am

Sorry to bring up this old thread but I couldn't find a better and more relevant place.

I have a similar situation, where I wish to remove the code between two tags in many thousands of files.

Here is the code snippet:

<AFFILIATECODEBEGIN>

<p align="center">
<script type="text/javascript"><!--
auctionads_ad_client = "editedforprivacy";
auctionads_ad_campaign = "42efbc14b4c2adfae40ff87882f07569";
auctionads_ad_width = "120";
auctionads_ad_height = "240";
auctionads_ad_kw =  "japan";
auctionads_color_border =  "CC0000";
auctionads_color_bg =  "FFFFFF";
auctionads_color_heading =  "000000";
auctionads_color_text =  "000000";
auctionads_color_link =  "FFFFFF";
--></script>

<script type="text/javascript" src="http://ads.auctionads.com/pagead/show_ads.js">
</script>

</p>

</AFFILIATECODEBEGIN>

I have tried running

awk '/<AFFILIATECODEBEGIN>/{p=1;print}/<\/AFFILIATECODEBEGIN>/{p=0}!p' festivals.html

And also using * instead of naming a specific file. I am output the contents of the file, but nothing is removed or changed.

Thanks in advance for any suggestions.

Franklin52 · May 11, 2009, 7:33am

Try this:

awk '/<AFFILIATECODEBEGIN>/{p=1}/<\/AFFILIATECODEBEGIN>/{p=0;next}!p' festivals.html

panyam · May 11, 2009, 7:34am

Try this :

 
TESTBOX>awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next }
> /<\/AFFILIATECODEBEGIN>/ { print ;} ' html.txt
 
o/p :

<AFFILIATECODEBEGIN>
something
</AFFILIATECODEBEGIN>

jdv · May 11, 2009, 7:56am

Thanks to you both. Franklin your command output the contents of the file but no changes, and the output of panyam was:

awk: cmd. line:1: /<AFFILIATECODEBEGIN>/ { print ; print "something "; next } > /<\/AFFILIATECODEBEGIN>/ { print ;} 
awk: cmd. line:1:                                                             ^ syntax error

The syntax error being the ">" between "next }" and "/<\/"

Franklin52 · May 11, 2009, 8:08am

This is what I get:

$ cat file
abc
abc
abc
<AFFILIATECODEBEGIN>

<p align="center">
<script type="text/javascript"><!--
auctionads_ad_client = "editedforprivacy";
auctionads_ad_campaign = "42efbc14b4c2adfae40ff87882f07569";
auctionads_ad_width = "120";
auctionads_ad_height = "240";
auctionads_ad_kw =  "japan";
auctionads_color_border =  "CC0000";
auctionads_color_bg =  "FFFFFF";
auctionads_color_heading =  "000000";
auctionads_color_text =  "000000";
auctionads_color_link =  "FFFFFF";
--></script>

<script type="text/javascript" src="http://ads.auctionads.com/pagead/show_ads.js
">
</script>

</p>

</AFFILIATECODEBEGIN>
xyz
xyz
xyz
$
$
$ awk '/<AFFILIATECODEBEGIN>/{p=1}/<\/AFFILIATECODEBEGIN>/{p=0;next}!p' file
abc
abc
abc
xyz
xyz
xyz
$

panyam · May 11, 2009, 8:19am

Hi jdv,

FYI :

avalon:/disk1/jvsh/TEST>awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next }
> /<\/AFFILIATECODEBEGIN>/ { print ;} ' html.txt

> is the character u will get on screen wen u press ENTER , it means the command is continuing in in the next line. it is not the part of the command.

If you put in a single line

 
 
awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next } /<\/AFFILIATECODEBEGIN>/ { print ;} ' input_file.txt

jdv · May 11, 2009, 8:48am

]# awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next }
> /<\/AFFILIATECODEBEGIN>/ { print ;} ' festivals.html
<AFFILIATECODEBEGIN>
something 
</AFFILIATECODEBEGIN>

then nano festivals.html shows nothing has changed in the actual file.

Thanks

panyam · May 11, 2009, 8:52am

Ofcourse nothing will change in the actual file .

You need to redirect the output of the command to some other file to store the data.

jdv · May 11, 2009, 9:13am

sorry to be a pain but how does one do that? Will is still retain all the rest of the code in the files?

panyam · May 11, 2009, 9:37am

awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next } /<\/AFFILIATECODEBEGIN>/ { print ;} ' input_file.txt >> output_file.txt

input_file.txt content will remain same .

output_file.txt content will be

<AFFILIATECODEBEGIN>
something
</AFFILIATECODEBEGIN>

jdv · May 11, 2009, 9:43am

Thanks.. but how as per my first question how can I do this for thousands of files?

panyam · May 12, 2009, 12:45am

 
for file in `ls *`
do
awk '/<AFFILIATECODEBEGIN>/ { print ; print "something "; next } /<\/AFFILIATECODEBEGIN>/ { print ;} ' $file >> $file"_changed"
done

jdv · May 12, 2009, 3:36am

Thanks. But I cannot change the filename, and I need to do it recursively. I have .html and .htm files in many subdirs, all of which need to have this code removed but cannot have filenames changed or otherwise modified. Thank you for your kind help and apologies for not being more knowledgeable on this

panyam · May 12, 2009, 5:06am

Javed,

thers is a flaw in my script, it will remove all text( other than b/w tags, which it should not in all cases), better use the solution suggested by Franklin. The below code will do the job for you. Test is throughly before using in production.

 
 
for i in `find . -name "*\.html"`
do
awk '/<AFFILIATECODEBEGIN>/{p=1;print;print"something in b/w tags";next}/<\/AFFILIATECODEBEGIN>/{p=0;print;next}!p' $i >> $i"_Chng"
mv $i"_Chng" $i
done