echo 20110101 | awk '{ print match($0,/^((17||18||19||20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/))
I am getting a match for the above, where as it shouldn't, as there is no hyphen in the echoed date.
Another question is what is the difference between || and | in the above statement
The string ||
in an ERE (outside of a bracket expression) produces undefined results. Otherwise, a |
in an ERE (outside of a bracket expression) separates alternatives to be matched by the ERE.
In an ERE, the expression -*
matches zero or more hyphens.
In your awk
statement, you not only have a few undefined terms in your ERE, you also have an extra )
, a missing }
, and a missing '
; so there is no way that that awk
statement produced any output other than a diagnostic message.
With the awk
script:
echo 20110101 | awk '
{match($0,/^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$/)
print RSTART, RLENGTH
}'
the output is:
1 8
The matching parts of the ERE are marked in red.
-*
means any number of hyphens, 0...many. So 0 hyphen matches.
I think what is missing from is here is the follow up comment that if you want to ensure that a hyphen is there, keep it as a literal character without any special meaning, i.e. drop the meta-character *
You would change this expression:-
....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-*([1-9]|0[1-9]|1[012])-*([1-9]|0[1-9]|[12][0-9]|3[01])$....
...to this:-
....^((17|18|19|20)[0-9][0-9]|[0-9][0-9])-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01])$....
Does this help clarify what you need to do?
Robin
The part that I found strange about the given ERE is that it will accept dates like 2015115
(which is ambiguous) as well as 2015-1-15
and 2015-11-5
(both of which are clear as to where the break is between the month and day). And, although 20151-15
and 201511-5
might be unambiguous, I'm not sure that I would want to accept them as "valid" date input (and both of these are also accepted by the given ERE).