sed - delete content inside tags multiline

I need that a certain part of the content below excluded
==Image Gallery== followed by <gallery> and the content until </gallery>

test SED1

==Image Gallery==
<gallery>
Image:car1.jpg| Car 1<sup>1</sup>
Imagem: car2.jpg| Car2<sup>2</sup>
</gallery> test SED2

==Image Gallery==<gallery> Image:car3.jpg | car3<sup>1</sup> </gallery>test SED3
test SED4
test SED5 ==Image Gallery== <gallery>
Image: plane1.jpg | plane1 <sup> 1
</sup> Image:plane2.jpg | plane2 <sup>2</sup>
</gallery> teste SED6
test SED7

With this :

sed -e '/==Image Gallery==.*<gallery>/ { :k s/<gallery.*[^gallery>]*\/gallery>//g; /</ {N; bk } }' file

I got this:

test SED1

==Image Gallery==
<gallery>
Image:car1.jpg| Car 1<sup>1</sup>
Imagem: car2.jpg| Car2<sup>2</sup>
</gallery> test SED2

==Image Gallery==test SED3
test SED4
test SED5 ==Image Gallery==  teste SED6
testSED7


The result should be:

test SED1

test SED2

test SED3
test SED4
test SED5  test SED6
test SED7

:wall: What am i missing ?

... if perl is an option:

perl -0777 -pe 's/==Image Gallery==.?<gallery>.*?<\/gallery>\s?//gs' file

Or using mawk or GNU awk:

awk 'NR%2' RS='==Image Gallery==|</gallery>' ORS= file

---
One problem with your sed attempt is this: [^gallery>] . You can only use this negation for single characters, not for strings. The construct used here effectively means: a single character that is not g , a , l , e , r , y or > .

So you cannot force lazy matching this way.

How about

sed -n 'H;g;s#==*Image.*</gallery>##g;h; $p' file

test SED1

 test SED2

test SED3
test SED4
test SED5  teste SED6
test SED7

That works fine for the sample, but there would not be lazy matching, so if the pattern would appear twice on a line, this would not work for text between multiple patterns on the same line.

An alternative would be to use an arbitrary replacement character (for example ), something like:

sed '1h;1!H;$!d;g;s#</gallery>#�#g;s#==Image Gallery==[^�]*�##g' file

--
In jethrow's perl suggestion lazy matching is accomplished by the lazy matching operator ? : .*?

1 Like

Several solutions are fine, but there is a problem with the test file that contemplates only a snippet of what needs to be checked, being necessary to expand the test file for a sql insert wich has several words and many parts like the above file and there is still over the characters '\n' in the middle of the file that complicates further as it is a sql dump of an entire table of mediawiki.
But i managed to solve the initial problem using sed and perl.

sed 's/==.\{0,2\}Gallery.\{0,12\}==//g' $arqOriginal > $arqSed1
perl -0777 -pe 's/<gallery>.*?<\/gallery>\s?//gs' $arqSed1 > $arqSed2