Just thought I'd let everyone know that I decided to implement this in awk in the end, as processing the syntax of a gazillion shell scripts is easier to do with awk. As $0 can just be processed the way I wanted leaving whitespace intact using "\n" as the RS.
I am writing a script which will convert shell syntax to colour-highlighted HTML. I found scripts to do this for just about every language EXCEPT for humble old SH so I decided to do it myself! You can view a sample output from the script HERE . The script itself is still under test and thus is not yet online.
Sorry for the "shameless plug" but I thought I'd let everybody know what the purpose of my original post was.