Valid separator in time and date format

Hello.
I can use any particular (stupid or not) format when using bash date command.
Example :

~> date --date "now" '+%Y-%m-%d %H!%M!%S'
2019-06-03 12!55!33

or

~> date --date "now" '+%Y£%m£%d %H¤%M¤%S'
2019£06£03 12¤57¤36

or

~> date --date "now" '+%Y-%m-%d %H-%M-%S'
2019-06-03 12-58-51
 

But I can't verify if the input date is in a particular format.
I use this piece of code to make verification :

d='2019-05-31 17:00:00'
MY_FORMAT='+%Y-%m-%d %H-%M-%S'


if [[ "`date --date "$d" "$MY_FORMAT" 2>/dev/null`" == "$d" ]] ; then
   echo "$d is valid to relative \"$MY_FORMAT\" date format"
else
  echo "$d is NOT VALID  to relative \"$MY_FORMAT\" date format"
fi

Example 1 : standard format and good date time ( Relative to date format template )

CURRENT FORMAT      : +%Y-%m-%d %H:%M:%S
DATE TO BE VERIFIED : 2019-05-31 17:00:00
Printing input date in current format : 
2019-05-31 17:00:00 
.
Now doing the test
.
2019-05-31 17:00:00 is valid to relative "+%Y-%m-%d %H:%M:%S" date format
.
.

Good result for the test

Example 2 : standard format and and bad date time ( Relative to date format template )

CURRENT FORMAT      : +%Y-%m-%d %H:%M:%S
DATE TO BE VERIFIED : 2019-05-31 17-00-00
Printing input date in current format : 
date: invalid date �2019-05-31 17-00-00'
 
.
Now doing the test
.

2019-05-31 17-00-00 is NOT VALID  to relative "+%Y-%m-%d %H:%M:%S" date format
.
.

Good result for the test.
But this is a coincidence.
In any data tested, the test will fail because the time format 'HH-MM-SS' is refused by the date command.
So the test will always failed.

Example 3 : personnal format and bad date time ( Relative to date format template )

CURRENT FORMAT      : +%Y-%m-%d %H-%M-%S
DATE TO BE VERIFIED : 2019-05-31 17:00:00
Printing input date in current format : 
2019-05-31 17-00-00 

.
Now doing the test
.

2019-05-31 17:00:00 is NOT VALID  to relative "+%Y-%m-%d %H-%M-%S" date format

.
.

Good result for the test

The date is accepted by the date command.
But the test fail because the time is not in the template format 'yyyy-mm-dd HH-MM-SS'.

Example 4 : personnal format and good date time ( Relative to date format template )

CURRENT FORMAT      : +%Y-%m-%d %H-%M-%S
DATE TO BE VERIFIED : 2019-05-31 17-00-00
Printing input date in current format : 
date: invalid date �2019-05-31 17-00-00'
 

.
Now doing the test
.

2019-05-31 17-00-00 is NOT VALID  to relative "+%Y-%m-%d %H-%M-%S" date format

.
.

Bad result for the test.
The test will always failed because the time format 'HH-MM-SS' is refused by the date command.
So the test failed.

Is it possible to test the relative to the format used if the format used is not standard.
Any help is welcome

1 Like

There are several different aspects to your question, so bear with me. Perhaps i come from a different background of scripting as i mostly write scripts which have to run on a multitude of different UNIX systems and Linux systems too. Portability is of the utmost importance for me.

GNU is not a given
I often see the -d / --date option of the date command used. Note that standard-conforming date -commands do NOT have that, so you are limiting your scripts to systems with GNU-date (and, IIRC, FreeBSD-date) installed. If this is OK for you, then so be it, but you should be aware that it is a design decision. If you want to avoid this dependency you might want to look at perderabos datecalc script, here is an example of how to use it, along with a C program you may find useful too.

internal and external date representation
The way UNIX/Linux systems represent the date information is like this: there is an "external" representation, which is what you are trying to work with. But this is only a representation of what is used internally to measure time and this is the "UNIX time" or "epoch". It is a 32-bit-unsigned integer counting the seconds since 0:00, Jan 1st, 1970. Notice that it will overflow in somewhere in 2038.

If you are planning to work with dates and do arithmetic with dates my sugestion is to do it like perderabo: create a layer of scripts to convert everything to/from a common representation (ideally this should be the epoch time because it lends itself well to numeric manipulation/calculation) and then calculate with the resulting integers.

interpreting arbitrary formats
First off: arbitrary formats are exactly that: arbitrary. Their (correct) interpretation is somewhat of a guessing game and every rule you can come up with can be circumvented (or "made not to work") by some outlandish format. I have once tried to create a "time format canonifier" you might want to use as a starting point for your own development. It is not exactly what you want but maybe you can get some ideas from it:

f_ConvertTime ()
{

typeset -i iRetVal=0
typeset    chTime="$1"
typeset -i iHours=0
typeset -i iMinutes=0
typeset -i iSeconds=0
typeset -i iTimeSecs=0

$chFullDebug
                                                 # correct timestring
chTime="$(print - "$chTime" | sed 's/[^0-9]/:/g')" # 14.15.00 -> 14:15:00
chTime="00${chTime}:00:00"                       # 14 -> 14:00:00
						 # :15 -> 00:15:00:00
if [ "$(print - $chTime | cut -d':' -f1)" != "" ] ; then
     iHours=$(print - $chTime | cut -d':' -f1)
     if [ $iHours -lt 0 -o $iHours -gt 23 ] ; then
	  iRetVal=1
     fi
fi
if [ "$(print - $chTime | cut -d':' -f2)" != "" ] ; then
     iMinutes=$(print - $chTime | cut -d':' -f2)
     if [ $iMinutes -lt 0 -o $iMinutes -gt 59 ] ; then
	  iRetVal=1
     fi
fi
if [ "$(print - $chTime | cut -d':' -f3)" != "" ] ; then
     iSeconds=$(print - $chTime | cut -d':' -f3)
     if [ $iSeconds -lt 0 -o $iSeconds -gt 59 ] ; then
	  iRetVal=1
     fi
fi
(( iTimeSecs = iSeconds + iMinutes * 60 + iHours * 3600 ))

print - $iTimeSecs

return $iRetVal
}

I hope this helps.

bakunin

3 Likes

Where I ran into real world problems is with free form entry on date (as text) fields. So. The mix provided by the OP was not realistic based on my experience.

I had to correct several hundred million rows of almost unbelievable garbage dates in a transaction table. The data was entered by users in several countries.

Example: "Apr 4". Okay. What year? I had to use the table's unique sequence and search nearby to find temporally close rows until I found a year. So for each
bizarre date problem we ran a separate script with a subsequent validation script. Took several weeks to fix the mess.

Overall the problems got better run times as more neighbors were "fixed" - in this one kind of example problem.

1 Like

I think the way to tackle this problem is to start by converting the format string to a RE with named groups e.g %m-%d-%y would become (?<d>\d\d)-(?<m>\d\d)-(?<Y>\d{4})

There are still locale issues as %a would become (?<a>Mon|Tue|Wed|Thu|Fri|Sat|Sun) in English locals but (?<a>Lun|Mar|Mer|Jeu|Ven|Sam|Dim) in French locals.

Once this is done if a string matches the RE you can then pull out the named group values and verify they go together correctly e.g that a=Tue is correct for d=4 m=6 Y=2019.

Been thinking about this problem and thought I'd give it a bit of a go with gawk.

This currently only supports %d , %m , %Y and %a (current locale strings)

gawk -v e='%a %d-%m-%Y' '
function regdt(str) {
   c=substr(str, 1, 1)
   ret = "("
   switch(c) {
      case "a":
         ret = ret "[A-Z][a-z]{2}"
      break
      case "d":
         ret = ret "[0-9]{2}"
      break
      case "m":
         ret = ret "[0-9]{2}"
      break
      case "Y":
         ret = ret "[0-9]{4}"
      break
      default:
         print "Unsupported date fmt string: " c
   }
   ctype[capture++] = c
   return ret ")" substr(str, 2)
}
function verify(i,dow,day,month,year) {
   for(v in ctype) {
      value=got[++i]
      switch(ctype[v]) {
         case "a":
             dow = value
         break
         case "d":
             day = value + 0
             if (day < 1 || day > 31)
               return "Illegal dom: " day
         break
         case "m":
             month = value + 0
             if (month < 1 || month > 12)
               return "Illegal month num: " month
         break
         case "Y":
             year = value + 0
             if (year < 1900 || year > 2200)
               return "Illegal year: " year
         break
      }
   }
   if(day == 0 || month == 0 || year == 0)
      return "Must have day, month and year"
   dt=mktime(year " " month " " day " 00 00 00")

   if (day != strftime("%d", dt))
       return "  Illegal day for month" 

   if (dow != "" && strftime("%a", dt) != dow)
       return "  Wrong dow " dow " should be " strftime("%a", dt)

   return ""
}
BEGIN {
   vals=split(e, vl, "%");
   for(i=1; i<=vals; i++)
      if (i==1) expr = vl;
      else expr=expr regdt(vl)
   print "Regexp is " expr
}
{
    print "String: " $0
    if(match($0, expr, got)) {
        erstr=verify()
        if(erstr == "") print "  Date OK"
        else print "  " erstr
    } else
       print "  Date doesnt match " e " format"
}
' infile

Infile is

Sat 29-12-2018
Thu 10-02-2019
Fri 30-02-2019
Wed 29-15-2018
Tue 2/2/1900

Result:

Regexp is ([A-Z][a-z]{2}) ([0-9]{2})-([0-9]{2})-([0-9]{4})
String: Sat 29-12-2018
  Date OK
String: Thu 10-02-2019
    Wrong dow Thu should be Sun
String: Fri 30-02-2019
    Illegal day for month
String: Wed 29-15-2018
  Illegal month num: 15
String: Tue 2/2/1900
  Date doesnt match %a %d-%m-%Y format
2 Likes