Dealing with XML comments

I'm writing my own simple XML parser as an experiment. It's a lot more complicated than it's supposed to be.

Things supposedly forbidden in XML comments happen all the time in the wild. You're never, ever supposed to find -- inside <!-- xml comments --> but in practice, you don't just find that, you find while(j-->d) inside them. I'm not sure how an XML parser is supposed to be able to handle that without choking -- "Hm, my XML comment seems to be followed by 37233 characters of garbage followed by some invalid syntax, I should go back and assume that's all inside the comment"?

I guess they drill down from the top level to the inner levels.

An XML parser is an extremely complicated bit of code if it is expected to cover the full XML specification. There are a number of different parsing models - DOM, SAX, StAX and more. For some ideas, have a look at the source code for libxml.