finding date numeral from file and check the validity of date format

manas_ranjan · November 17, 2011, 5:23am

hi there

I have file names in different format as below

triss_20111117_fxcb.csv
triss_fxcb_20111117.csv
xpnl_hypo_reu_miplvdone_11172011.csv
xpnl_hypo_reu_miplvdone_11-17-2011.csv
xpnl_hypo_reu_miplvdone_20111117.csv
xpnl_hypo_reu_miplvdone_20111117xfb.csv
triss_fxcb_20111117.csv.checksum

Now my this little piece of awk will give me only date from that file name

echo $name | awk -F"_" '{  for(i=1;i<=NF;++i) if($i ~ /[[:digit:]]/) print $i}'

if name is triss_20111117_fxcb.csv then o/p 20111117 which is perfect
if name is xpnl_hypo_reu_miplvdone_11-17-2011.csv then o/p is 11-17-2011.csv which is not perfect as .csv
if name is xpnl_hypo_reu_miplvdone_20111117xfb.csv then o/p is 20111117xfb.csv
if name is triss_fxcb_20111117.csv.checksum then o/p is 20111117.csv.checksum

question is how to remove .csv or any charcter from the o/p as I only need the date from the filename ?

and once I have the date in format like

YYYYMMDD,DDMMYYYY,MMDDYYYY or YYYY-MM-DD

how can i validate these date format are valid date. date can be in any of above form.
e.g.
11-17-2011
20111117
11172011

Franklin52 · November 17, 2011, 5:46am

Try:

awk -F[._] '{
  for(i=1;i<=NF;++i) {
    s=$i
    if(gsub("[0-9]",x,s)==8){
      gsub("[a-zA-Z]",x,$i) 
      print $i
    }
  }
}' file

Klashxx · November 17, 2011, 5:53am

If perl is ok, this is the equivalent for the dates:

 perl -n -e '/(\d+-*\d*-*\d*)/;print $1."\n";' file

And for the validation, you can star with:

#!/usr/bin/env ksh

validaF ()
{
fecha="${1}"

echo "${fecha}"|perl -n -e '
   if ( m!^((?:19|20)\d\d)(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])$!) {
      # $1 -> anyo  $2 -> mes , $3 -> dia
      if ($3 == 31 and ($2 == 4 or $2 == 6 or $2 == 9 or $2 == 11)) {
         print "0" ; # Meses con 30 dias
         }
      elsif ($3 >= 30 and $2 == 2) {
         print "0" ; # Febrero nunca 30 o 31
         } 
      elsif ($2 == 2 and $3 == 29 and not ($1 % 4 == 0 and ($1 % 100 != 0 or $1 % 400 == 0))) {
         print "0" ; # 29 de Febrero cuando no es bisiesto
         } 
      else {
         print "Correct date !: $_ \n"; # Fecha valida
         }
      } 
    else {
       print "KO: $_" ; # Sin formato de fecha
    }'
}

validaF 20110811
validaF 20110841

Of course you need to adapt the regex to match the rest of the date formats.

manas_ranjan · November 17, 2011, 6:01am

thanks KlashXX and Franklin for your suggestion. but unfortunatley I can't use perl in my case .
and I found Frankling sugesstion working for all type of files .

By the mean time I tried below option

echo $name | awk -F'[^0-9]' '{  for(i=1;i<=NF;++i) if($i ~ /[[:digit:]]/) print $i}'

which is also works for almost all the conditions except if the filename is
xpnl_hypo_reu_miplvdone_11-17-2011.csv i.e if in any characters are in between then o/p is

11
17
2011

can anyone suggest how to improve this liner.I don't care if any - removed between number but atleast it should be numeral in one line not line after another.

anyway can anyone suggest how to validate the date o/p please??

Klashxx · November 17, 2011, 6:09am

Try:

awk -F'[^0-9-]' '{  for(i=1;i<=NF;++i) if($i ~ /[[:digit:]]/) print $i}'

manas_ranjan · November 17, 2011, 6:18am

you are star man.
Now can you assist me how to validate those o/p as correct date format.
My date can be in anyform from the below .

YYYYMMDD,DDMMYYYY,MMDDYYYY,MM-DD-YYYY or YYYY-MM-DD

ahamed101 · November 17, 2011, 6:19am

use printf instead of print

--ahamed

kondeti_satish · November 17, 2011, 6:31am

echo $name | sed s/[a-z_.]//g

manas_ranjan · November 17, 2011, 9:23am

thanks but still

how to validate the date when o/p date can be in any format.
no static one .

is there any way to do so??

---------- Post updated at 06:34 AM ---------- Previous update was at 06:32 AM ----------

excellent man.

---------- Post updated at 09:23 AM ---------- Previous update was at 06:34 AM ----------

if name=hvar_rgrpd_10d_hvams17_11-17-2011_kgr_prod.rec
Then none of the trick working .
can anyone come up with better idea to extarct the date field only ???

ahamed101 · November 17, 2011, 9:37am

Try this...

sed 's/.*\([0-9]\{2\}[- ]*[0-9]\{2\}[- ]*[0-9]\{4\}\).*/\1/g' input_file

--ahamed

Klashxx · November 18, 2011, 2:58pm

If gawk (GNU awk) is avaliable for you ( most linux distros ) , you can do the following:

# cat file
triss_20111117_fxcb.csv
triss_fxcb_20111117.csv
xpnl_hypo_reu_miplvdone_11172011.csv
xpnl_hypo_reu_miplvdone_11-17-2011.csv
xpnl_hypo_reu_miplvdone_20111117.csv
xpnl_hypo_reu_miplvdone_20111117xfb.csv
triss_fxcb_20111117.csv.checksum
r_rgrpd_10d_hvams17_11-17-2011_kgr_prod.rec

# gawk 'match($0,/([0-9]+-*[0-9]+-*[0-9]+)/,a){print a[1]}' file
20111117
20111117
11172011
11-17-2011
20111117
20111117
20111117
11-17-2011

Then is easy to translate the perl code to awk:

#!/usr/bin/env ksh

# YYYYMMDD,DDMMYYYY,MMDDYYYY or YYYY-MM-DD

validaF ()
{
dateV="${1}"

echo "${dateV}"|gawk  '{
   if (match($0,/^((?:19|20)[0-9][0-9])-*(0[1-9]|1[012])-*(0[1-9]|[12][0-9]|3[01])$/,a)) {
      year=a[1]+0
      mon=a[3]+0
      day=a[4]+0
      }
   else if (match($0,/^(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])((19|20)[0-9][0-9])$/,a)) {
      year=a[3]+0
      mon=a[2]+0
      day=a[1]+0
      }
   else if (match($0,/^(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])((19|20)[0-9][0-9])$/,a)) {
      year=a[3]+0
      mon=a[1]+0
      day=a[2]+0
      }
   else {
       print "KO: "$0
       exit
     }

   if (day == 31 && (mon == 4 ||  mon == 6 || mon == 9 || mon == 11)) 
      print "KO: "$0 # 30 days months
   else if (day >= 30 && mon == 2) 
      print "KO: "$0 # Febrary never 30 o 31 
   else if (mon == 2 && day == 29 && ! (  year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)))
      print "KO: "$0 # Febrary  29 leap year
   else  
      print "Correct date !:" $0 
   }'

}


validaF 11082011
validaF 12312010
validaF 13312010
validaF 20110811
validaF 20110841
validaF 2012-02-29
validaF 20110229

# date.sh
Correct date !:11082011
Correct date !:12312010
KO: 13312010
Correct date !:13312010
Correct date !:20110811
KO: 20110841
Correct date !:20110841
Correct date !:2012-02-29
KO: 20110229