Regular expression match

echo 20110101 | awk '{ print match($0,/^((17||18||19||20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/))

I am getting a match for the above, where as it shouldn't, as there is no hyphen in the echoed date.

Another question is what is the difference between || and | in the above statement

The string || in an ERE (outside of a bracket expression) produces undefined results. Otherwise, a | in an ERE (outside of a bracket expression) separates alternatives to be matched by the ERE.

In an ERE, the expression -* matches zero or more hyphens.

In your awk statement, you not only have a few undefined terms in your ERE, you also have an extra ) , a missing } , and a missing ' ; so there is no way that that awk statement produced any output other than a diagnostic message.

With the awk script:

echo 20110101 | awk '
{match($0,/^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/)
 print RSTART, RLENGTH
}'

the output is:

1 8

The matching parts of the ERE are marked in red.

-* means any number of hyphens, 0...many. So 0 hyphen matches.

I think what is missing from is here is the follow up comment that if you want to ensure that a hyphen is there, keep it as a literal character without any special meaning, i.e. drop the meta-character *

You would change this expression:-

....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$....

...to this:-

....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01])$....

Does this help clarify what you need to do?

Robin

The part that I found strange about the given ERE is that it will accept dates like 2015115 (which is ambiguous) as well as 2015-1-15 and 2015-11-5 (both of which are clear as to where the break is between the month and day). And, although 20151-15 and 201511-5 might be unambiguous, I'm not sure that I would want to accept them as "valid" date input (and both of these are also accepted by the given ERE).