Date validity check

hi All,

i have file in which it has 2000 records like,

test.txt

2011-03-01
2011-03-01
2011-03-01
2011-03-01
2011-03-01
2011-03-02
2011/03/02

previously i used for loop to find the date check like below,

for i in `cat test.txt`
do
d=`echo $i | cut -c9-10| sed 's/^0*//'`;
m=`echo $i | cut -c6-7`;
Y=`echo $i | cut -c1-4`;
if cal $m $Y| tr -s " " "|" | tr -s "\n" "|" | grep $d > /dev/null 2>&1;
then 
a=1;
else 
echo "N" ; 
fi
done

but it is taking so much time when the file has 7000 records,i need a command to find the whetehr any invalid date is there in the file.if any one of the date is invalid i need to return a flag.

please help asap.

With GNU date:

while read d
do
    date -d "$d" >/dev/null 2>&1
    [ $? -ne 0 ] && echo $d
done < test.txt

I'm assuming the bad date format is the last one in your test.txt output above. This will print the line number and the bad date format.

perl -ne 'print "Line #$.: $1\n" if(/(\d+\/\d+\/\d+)/)' test.txt
Line #9: 2011/03/02

You call many external commands (cut, sed etc.) which all take take to start a sub process and this is what is costing you.

I'm aware that date -d is not available in all implementations. What OS are you using?

Trying to avoid being OS specific (and not the neatest code) could you consider this:-

$ cat test.txt
2011-03-01
2011-03-01
2011-03-01
2011-03-01
2011-03-01
2011-03-02
2011/03/02
$ (IFS=-;tr "\/" "-" < test.txt | while read d m Y
> do
>    echo "I got \$d=\"$d\", \$m=\"$m\" and \$Y=\"$Y\""
> done)
I got $d="2011", $m="03" and $Y="01"
I got $d="2011", $m="03" and $Y="01"
I got $d="2011", $m="03" and $Y="01"
I got $d="2011", $m="03" and $Y="01"
I got $d="2011", $m="03" and $Y="01"
I got $d="2011", $m="03" and $Y="02"
I got $d="2011", $m="03" and $Y="02"
$ 

If you can be sure that you don't actually have any / as date separators in your input (as you posted, possibly in error) then it simplifies to just:-

$ (IFS=- ; while read d m Y
> do
>    echo "I got \$d=\"$d\", \$m=\"$m\" and \$Y=\"$Y\""
> done <test.txt)

I'm not quite sure what you are trying with the remainder. Are you trying to validate that it is an acceptable date?

If you are sure you are getting just numerics, that might be better as a case statement like this:-

case $m in
   01) mxd=31 ;;
   02) ((a=($Y%4)/$Y)) 2>/dev/null ;;        # Handles leap year
   03) mxd=31 ;;
   04) mxd=30 ;;
   05) mxd=31 ;;
   06) mxd=30 ;;
   07) mxd=31 ;;
   08) mxd=31 ;;
   09) mxd=30 ;;
   10) mxd=31 ;;
   11) mxd=30 ;;
   12) mxd=31 ;;
   *) echo "Invalid month" ; exit 99
esac

if [ $d -gt $mxd -o $d -lt 1 ]
then
   echo "Invalid day" ; exit 99
fi

Of course, if you are just looking for the right format, then:-

grep -v "^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$" test.txt

I hope that this helps,
Robin
Liverpool/Blackburn
UK

hi all,

my requirement is that in a file only dates will be there, in that i have to find whether any invalid date is there. if any invalid date is there i have to just get the result the file is invalid.

the correct format of date is "2011-03-13"(YYYY-MM-DD).

the version and os i am using is

SunOS upp21n 5.10 Generic_148888-05 sun4u sparc SUNW,SPARC-Enterprise

Python makes this task insanely simple:

>>> import time
>>> mask = '%Y-%m-%d'
>>> if time.strptime('2013-12-12', mask):                                                       
...    print "date ok"                                                                          ... 
date ok

Complete script:

#!/usr/bin/python
import time 
def check_mask ( datec,mask='%Y-%m-%d' ):
    try: 
        time.strptime(datec, mask)
    except ValueError:
        return False
    return True

f = open('./test.txt','rb')
for d in f:
   if not check_mask(d.rstrip()):
      print "Invalid: %s" % d,
f.close()

Okay, so I can refine simple grep then apply the other parts in sequence. The first test is for basic structural validation, then it follows with a date check without calling date or cal which will slow down processing on a large file too.

#!/bin/ksh
echo "Basic invalid formatted dates found:-"
grep -v "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" test.txt | while read line
do
   echo "\t$line"
done

echo "\nLooking for invalid date values:-"
(IFS=-
typeset -i d m Y mxd L
grep "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" test.txt | while read Y m d
do
   case $m in
      1)  mxd=31 ;;
      2)  ((L=($Y%4)/($Y%4))) 2>/dev/null ; ((mxd=29-$L)) ;; # Handles leap year
      3)  mxd=31 ;;
      4)  mxd=30 ;;
      5)  mxd=31 ;;
      6)  mxd=30 ;;
      7)  mxd=31 ;;
      8)  mxd=31 ;;
      9)  mxd=30 ;;
      10) mxd=31 ;;
      11) mxd=30 ;;
      12) mxd=31 ;;
      *)  mxd=0  ;;
   esac

   if [ ${d} -gt ${mxd} -o ${d} -lt 1 ]
   then
      echo "\tInvalid date found \"${Y}-${m}-${d}\""
   fi
done)

The echo statements man need to be replaced with printf depending on your OS.

I must say that I like the Python way, if you have that.

Additionally, if you have a database, that may have similar tools.

I hope that this helps,
Robin

@rbatte1

the above case staement is throwing error...

i need only shell script or perl as mine supports that only.

i should not read line by line beacuse if 5 lakh records are there it will take long time.

i think about a scenario could any one help in getting it in a script..

test.txt - main file

test.txt

2013-12-12
2013-13-12
2013-12-32
2012-02-29
2012-02-28
2013-12-31
2013-11-31
2013-11-30
2013/12/12
2013-32-12
2013-12-42
  1. find first level of format check...in this 2013/12/12 or 2013/23/43 0r 2013-13-32 or 2013-12-43 like this will and be be captured
grep -v "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" test.txt | while read line
do
   echo "\t$line" >> vv.txt
done
Header_chk=`cat vv.txt | wc -l`
Header_month_max=`cat test.txt | cut -c6-7 | sort -nr | head -1` ;
Header_month_min=`cat test.txt | cut -c6-7 | sort -nr | tail -1` ;
Header_Day_max=`cat test.txt | cut -c9-10 | sort -nr | head -1` ;
Header_Day_min=`cat test.txt | cut -c9-10 | sort -nr | tail -1` ;
 if [[   "$Header_chk" -ne 0 || "$Header_month_max" -gt "12" || "$Header_Day_max" -gt "31" "$Header_month_min" -eq "00" || "$Header_Day_min" -eq "00" ]]
   then
   echo "N">>vijay.txt
 fi

2.seperate feb into one file
feb file name is now feb.txt
i have to split feb.txt into two files( leap.txt and nonleap.txt)
leap.txt

Day_max=`cat leap.txt | cut -c9-10 | sort -nr | tail -1` ;
Day_min=`cat leap.txt | cut -c9-10 | sort -nr | tail -1` ;
  if [[   "$Day_max" -gt 28 || "$Day_min" -eq "00"  ]]
   then
   echo "N">>vijay.txt
  fi

nonleap.txt

Day_max=`cat nonleap.txt | cut -c9-10 | sort -nr | tail -1` ;
Day_min=`cat nonleap.txt | cut -c9-10 | sort -nr | tail -1` ;
  if [[   "$Day_max" -gt 29 || "$Day_min" -eq "00"  ]]
   then
   echo "N">>vijay.txt
  fi

but how to to split the leap year dates and non leap year dates in tow files :frowning:

3.seperate APR,june,sep,nov into one file
APR,june,sep,nov file name is now 30day.txt

Day_max=`cat 30day.txt | cut -c9-10 | sort -nr | tail -1` ;
Day_min=`cat 30day.txt | cut -c9-10 | sort -nr | tail -1` ;
  if [[   "$Day_max" -gt 30 || "$Day_min" -eq "00"  ]]
   then
   echo "N">>vijay.txt
  fi

finally after all validation if the vijay.txt has a N, it means a invalid date has been found in some check ..
finally i will tell the main file is invalid.
please help me out for this scenario by a script.

Putting code or output into CODE tags makes it a lot more readable. Can you edit your post to put them in please.

You also seem to be persisting with your original costly code which fires up all sorts of processes (cut, cat, sort, etc.). Can you show me the errors you are getting with my offering or why it is unacceptable. I can then try to help.

Reading the unformatted block of text, you seem to like my grep so I'd like to know how I can improve the rest.

Robin

You can do this easily in perl

validate.pl:

#!/usr/bin/perl
use Time::Local 'timelocal';
use POSIX;

while (my $ln = <STDIN>) {
    ($year, $month, $day) = split(/[-\/]/, $ln);
    eval { timelocal(0,0,0,$day,$month-1,$year) };
    exit 1 if $@;
}
exit 0;

Test file from you shell like this:

if ./validate.pl < test.txt
then
    echo "File is valid"
else
    echo "File is invalid"
fi
uapp291n  -> echo "Basic invalid formatted dates found:-"
grep -v "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" vv.txt | while read line
do
   echo "\t$line"
doneBasic invalid formatted dates found:-
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  -> grep -v "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" vv.txt | while read line
> done
>    echo "\t$line"
> done
        2011/03/01
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  -> echo "\nLooking for invalid date values:-"
Looking for invalid date values:-
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  -> (IFS=-
> typeset -i d m Y mxd L
grep "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" vv.txt | while read Y m d
> grep "^[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]$" vv.txt | while read Y m d
> do
>    case $m in
>       1)  mxd=31 ;;
>       2)  ((L=($Y%4)/($Y%4))) 2>/dev/null ; ((mxd=29-$L)) ;; # Handles leap year
ksh: syntax error: `>' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       3)  mxd=31 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       4)  mxd=30 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       5)  mxd=31 ;;
ksh: syntax error: `)' unexpected
      6)  mxd=30 ;;
      7)  mxd=31 ;;
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       6)  mxd=30 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       7)  mxd=31 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       8)  mxd=31 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       9)  mxd=30 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       10) mxd=31 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       11) mxd=30 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       12) mxd=31 ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->       *)  mxd=0  ;;
ksh: syntax error: `)' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->    esac
ksh: syntax error: `esac' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->    if [ ${d} -gt ${mxd} -o ${d} -lt 1 ]
   then
      echo "\tInvalid date found \"${Y}-${m}-${d}\""
   fi
>    thenne)
>       echo "\tInvalid date found \"${Y}-${m}-${d}\""
>    fi
ksh: test: argument expected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  -> done)
ksh: syntax error: `done' unexpected
/ddsaa/dd/src_files/dddd/bisslling/ihguapp291n  ->
 

this is the error i am getting

pls help

It would be nice if you wrap the output in CODE tags. It makes it so much easier to read for such little effort. Simply highlight the output or code, then on the toolbar of the little editing window, press the white square that has "co" over "de", between the speech bubble and the page with "php" on it.

Alternatively, add the text [ C O D E ] before the text and [ / C O D E ] afterwards manually.

Complaint over.

Perhaps I should have said that this is a shell script and not to be pasted to the command line. Can you save it all in a file. Let's call it datescan as an example and then do the following on the command line:-

chmod 755 datescan
./datescan

What output (remember the CODE tags) do you get from that?

Regards,
Robin