Please explain this SED expression

can anyone please explain this code?

sed ':t /<VirtualHost/,/VirtualHost>/ { /VirtualHost>/!{ $!{ N; bt } }; /name/d; }' infile

this sed command will remove lines containing "name" within elements <VirtualHost.....> and </VirtualHost.....>.

More like it removes the entry (except the first line) if any line contains "name"...

$ cat hosttest
stufff
name = name
more stuff
<VirtualHost>
qqqq
qqq2
qqqq3
name=qname
qqqqqq7
qqqqq9
</VirtualHost>
<VirtualHost>
zzzzzz
zzzzzz2
zzzz3
jjje=zzzz
</VirtualHost>
$ sed ':t /<VirtualHost/,/VirtualHost>/ { /VirtualHost>/!{ $!{ N; bt } }; /name/d; }' hosttest
stufff
name = name
more stuff
<VirtualHost>
<VirtualHost>
zzzzzz
zzzzzz2
zzzz3
jjje=zzzz
</VirtualHost>
 $

there is a remaining
<VirtualHost>
which makes the output ugly

---------- Post updated at 10:40 PM ---------- Previous update was at 10:39 PM ----------

(the one just after "more stuff") :slight_smile:

That unwanted "<VirtualHost>" is there because the /VirtualHost>/ pattern matches both the opening and closing markup. That causes { /VirtualHost>/!{ $!{ N; bt } } to not be executed for the first line of the block, hence that first "<VirtualHost>" will always be emitted by the implicit print at the end of the sed script since -n is not used. This also means that there's a logic error: /name/d will attempt to find a match after the first line of each block in addition to after the full block has been accumulated (which is what is intended).

To fix this problem, tighten up the regular expressions. The following should be better (I am not writing it on one line since my sed does not support the gnu extensions which allow labels to be followed by anything other than a newline):

:t
/<VirtualHost>/,/<\/VirtualHost>/ {
    /<\/VirtualHost>/! {
        $! {
            N
            b t
        }
    }
    /name/d
}

Regards,
Alister

1 Like