REGEX: Matching Null?

I'm using the URL Regex feature of Squid for allowing sites via a list of regex strings to match allowed domains. The regex was actually copied from our previous proxy solution and it seemed to "just work". But, we've recently discovered that some domains (likely due to virtual hosts or host header configuration depending on if it's Apache or IIS respectively) fail if they are used without the www prefix in the URL. Below is an example of what sometimes works:

http://.*\.microsoft\.com/.*

The '.*\.' before the 'microsoft\.com' portion SHOULD mean, any number of any characters (zero or more) followed by a '.' I see the error in terms of the '\'. portion of the regex and plan to fix that. However, I've been unable to find a way to match both 'www.microsoft.com' and 'microsoft.com'. Here's what I thought would work:

http://[!.*|.*\.]microsoft\.com/.*

I admit to being really bad with regex, so please don't be too hard on me please. :slight_smile: I've just never been able to "get it" 100%. Needless to say, the above doesn't work for me at all. It matches neither 'microsoft.com' nor 'www.microsoft.com'. I've tried some limited testing with 'grep' to try and find an adequate solution. But, what is it that I'm really trying to match? At first, I assumed I wanted a whitespace character, but I'm not looking for ' microsoft.com'. Then I thought, a null? But that seems to be impossible to match since it's not really a match at all since there's no character there. I'm sure someone who is an expert at regex would look at this and provide something insanely simple. I really don't want to do this:

http://[.*\.microsoft\.com/.*|microsoft\.com/.*]

or worse, this:

http://.*\.microsoft\.com/.*
http://microsoft\.com/.*

Any suggestions? Thanks in advance...

I am unfamiliar with Squid, and maybe regexps work differently there, but it looks to me like you need the '?' operator which matches the preceding expression 0 or 1 times, e.g.

http://(www\.)?microsoft\.com/

does what you want when used as a grep argument.

Your suggestion wound up working for me. I changed all of my lines to the following format:

http://(.*\.)?microsoft\.com/.*

That seems to have worked well. I knew someone on here would find this to be a simple problem to solve. :slight_smile: