Script to put block comment after finding regex in xml file

Poki · January 10, 2011, 4:04pm

hi,
i need my bash script to find regex in xml file.. and comment 2 lines before and after the line that contains regex.. can't use # needs to be  and the end of the comment.

so eg..

first block

<filter>
        <filter-name>MyRegEx</filter-name>
        <filter-class>MyRegExblablabla</filter-class>
    </filter>

second block

<filter-mapping>
        <filter-name>MyRegEx</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

so output should be

first block
<!--
 <filter>
        <filter-name>MyRegEx</filter-name>
        <filter-class>MyRegExblablabla</filter-class>
    </filter>

-->
 
second block
 
 
<!-- 
<filter-mapping>
        <filter-name>MyRegEx</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

 -->

unfortunately since it's an xml file.. # is not funtioning..
obviously there is no problem to find regex.. .. i can also find the lines of regex.. the difficulty is to wrap around 2 lines before and after.. with 
Does anyone know how to do it? i have been working on it for quite some time now..

Thank you

DGPickett · January 10, 2011, 5:08pm

You want to comments out lines, try sed, using N to get the entire element in /<$[^ >]*$.<\/\1>/, and then wrap it in comment markup s/.// or similar.

Poki · January 10, 2011, 5:14pm

what do you mean by ?

"using N to get the entire element in /<$[^ >]*$.*<\/\1>/, "

rdcwayx · January 10, 2011, 6:13pm

awk '/<filter>/||/<filter-mapping>/ {print "<!--\n" $0;next} 
     /<\/filter>/||/<\/filter-mapping>/ {print $0 "\n-->";next}1 ' infile

Poki · January 11, 2011, 2:12pm

thanks .. but i have lots of sections that look like this.. i only need to comment section that has MyRegEx

    <filter>
        <filter-name>something</filter-name>
        <filter-class>something</filter-class>
        <init-param>
            <param-name>filterKey</param-name>
            <param-value>remote</param-value>
        </init-param>
    </filter>
     
   
    <filter>
        <filter-name>MyRegEx</filter-name>
        <filter-class>MyRegExblabla</filter-class>
    </filter>


    <filter>
        <filter-name>something</filter-name>
        <filter-class>somethingelse</filter-class>
        <init-param>

            <param-name>filterKey</param-name>
            <param-value>something</param-value>
        </init-param>
    </filter>

        <filter-mapping>
            <filter-name>something</filter-name>
            <url-pattern>/*</url-pattern>
        </filter-mapping>

    <filter-mapping>
        <filter-name>something</filter-name>
        <url-pattern>/servlet/something</url-pattern>
        <url-pattern>/servlet/something</url-pattern>
        <url-pattern>/servlet/something</url-pattern>
        <url-pattern>/servlet/something</url-pattern>
        <url-pattern>/servlet/something</url-pattern>
    </filter-mapping>


    <filter-mapping>
        <filter-name>something</filter-name>
        <url-pattern>/remote/*</url-pattern>
    </filter-mapping>


    <filter-mapping>
        <filter-name>something</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>


    <filter-mapping>
        <filter-name>MyRegEx</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

DGPickett · January 11, 2011, 2:30pm

Using sed N to get an entire element. Let's say the element name is MsgBlk:

 
sed '
  /<MsgBlk[ >]/{
    :loop
    /<\/Msgblk>/!{
      $d
      N
      b loop
     }
    s/<MsgBlk[ >].*<\/MsgBlk>/<!--\
&\
-->/
   }
 '

Narrative:
Find the line with the opening element, and on it:
Set a branch target named loop.
If the element is not closed:
If at EOF, it is junk, delete it. Some sed are funny wih N at EOF.
Pile the next line at the end of the buffer.
Go back to check fo close of element.
Pick up just exactly this whole element, and where its ends were, put commenting markup and new lines around the while element.

Poki · January 11, 2011, 4:02pm

dgpickett:

Using sed N to get an entire element. Let's say the element name is MsgBlk:
 
sed '
  /<MsgBlk[ >]/{
   :loop
   /<\/Msgblk>/!{
   $d
   N
   b loop
   }
   s/<MsgBlk[ >].*<\/MsgBlk>//
   }
 '
Narrative:
Find the line with the opening element, and on it:
Set a branch target named loop.
If the element is not closed:
If at EOF, it is junk, delete it. Some sed are funny wih N at EOF.
Pile the next line at the end of the buffer.
Go back to check fo close of element.
Pick up just exactly this whole element, and where its ends were, put commenting markup and new lines around the while element.

i ran it like you have specified the output hasn't changed at all.. (substituted MsgBlk with my regex)

DGPickett · January 11, 2011, 4:09pm

Results vary. I used a pretty rough but effective regex. Did you use an extended regex without telling sed?:

echo 'first block
<filter>
        <filter-name>MyRegEx</filter-name>
        <filter-class>MyRegExblablabla</filter-class>
    </filter>
second block
<filter-mapping>
        <filter-name>MyRegEx</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
' | sed '
  /<filter[-maping]*[ >]/{
    :loop
    /<\/filter[-maping]*>/!{
      $d
      N
      b loop
     }
    s/<filter[-maping]*[ >].*<\/filter[-maping]*>/<!--\
&\
-->/
   }
 '
first block
<!--
<filter>
        <filter-name>MyRegEx</filter-name>
        <filter-class>MyRegExblablabla</filter-class>
    </filter>
-->
second block
<!--
<filter-mapping>
        <filter-name>MyRegEx</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
-->

PS: For variant element regex, the second test, element not closed, s/b a back referencer, $whatever$ I opened with, have I closed with that = \1 ?:

/<\(regex-name\)[ >].*<\/\1>/!{

Poki · January 11, 2011, 6:29pm

is there a way with sed to removed more than one set of lines in one line?

so i mean

sed ${firstElem},${lastIndex}d web.xml > web1.xml

this will delete lines between ${firstElem},${lastIndex}

i want in the same line to do somethinkg like this (doesn't work so far)

sed ${firstElem},${lastIndex}d ; ${secondElem},${lastIndex1}d web.xml > web1.xml

at the same time delete lines between ${firstElem},${lastIndex} and ${secondElem},${lastIndex1}d

if i don't do this in one line.. indexes for the second set change.. and i need to grep for them again.. so it would be really good if i could delete the two sets simulatiously..

thanks a lot

---------- Post updated at 06:29 PM ---------- Previous update was at 06:28 PM ----------

dgpickett:

Results vary. I used a pretty rough but effective regex. Did you use an extended regex without telling sed?:

echo 'first block
<filter>
   <filter-name>MyRegEx</filter-name>
   <filter-class>MyRegExblablabla</filter-class>
   </filter>
second block
<filter-mapping>
   <filter-name>MyRegEx</filter-name>
   <url-pattern>/*</url-pattern>
   </filter-mapping>
' | sed '
  /<filter[-maping]*[ >]/{
   :loop
   /<\/filter[-maping]*>/!{
   $d
   N
   b loop
   }
   s/<filter[-maping]*[ >].*<\/filter[-maping]*>/<!--\
&\
-->/
   }
 '
first block
<!--
<filter>
   <filter-name>MyRegEx</filter-name>
   <filter-class>MyRegExblablabla</filter-class>
   </filter>
-->
second block
<!--
<filter-mapping>
   <filter-name>MyRegEx</filter-name>
   <url-pattern>/*</url-pattern>
   </filter-mapping>
-->

PS: For variant element regex, the second test, element not closed, s/b a back referencer, $whatever$ I opened with, have I closed with that = \1 ?:

/<\(regex-name\)[ >].*<\/\1>/!{

trying a different way.. now..

is there a way with sed to removed more than one set of lines in one line?

so i mean

sed ${firstElem},${lastIndex}d web.xml > web1.xml

this will delete lines between ${firstElem},${lastIndex}

i want in the same line to do somethinkg like this (doesn't work so far)

sed ${firstElem},${lastIndex}d ; ${secondElem},${lastIndex1}d web.xml > web1.xml

at the same time delete lines between ${firstElem},${lastIndex} and ${secondElem},${lastIndex1}d

if i don't do this in one line.. indexes for the second set change.. and i need to grep for them again.. so it would be really good if i could delete the two sets simulatiously..

thanks a lot

DGPickett · January 13, 2011, 2:06pm

Well, sed will happily delete ranges with "/regex1/,/regex2/d", and you can do that as many times as you want (as long as one does not start within another).

Your xml is pretty well behaved, but I often normalize it so every element starts at the beginning of a line. That simplified subsequent parsing, so you do not need to worry about one line elements except like <xxx yyy="zzz" www="xxx" />. You could expand that, so it is entirely simply normal.

I often pile sed on sed in a pipeline. The sort of sed that loops and does N does not mix easily with those that do simple line filtering, so why not, pipes are free. You can even embed each sed command in an executable script file that begins in the line "#!/usr/bin/sed -f" (your path may vary) and execute the scripts.

ghostdog74 · January 13, 2011, 7:08pm

gawk 'BEGIN{RS="</filter>|</filter-mapping>"}
/>MyRegEx</{ $0="<!--\n"$0RT"\n-->" }1 ' file

DGPickett · January 14, 2011, 2:57pm

If you use regex not line #, then line numbering is not an issue. Consider doing it like "diff -e" output: put the changes in reverse line number order.

sed numbers lines at input, so the numbers do not change on one pass, in any case.