Get the oldest date based on date in the filename

gary_w · June 2, 2011, 11:41am

I am using ksh93 on Solaris.

Ok, this may seem like a simple request at first. I have a directory that contains sets of files with a YYYYMMDD component to the name, along with other files of different filespecs. something like this:

20110501_1.dat
20110501_2.dat
20110501_3.dat
20110502_1.dat
20110502_2.dat
20110502_3.dat
20110503_1.dat
20110503_2.dat
20110503_3.dat

I need to get the date component of the filename of the oldest set, and try to be flexible for future filename patterns. So far I have this:

$ ls |grep '.*[0-9]\{8\}.*'|sort |head -1|cut -c1-8
$ 20110501

problems with this:

grep is looking for a string of 8 numbers anywhere in the name (not really a date format-what if future files also have numbers in the name).
the cut assumes the date string of characters is in the beginning of the filename but I need to allow for it anywhere in the filename.
Whenever I start building long pipelines that is usually a red flag that I should look for a more efficient way.

Ideally I would like to base the search on a filename mask, i.e. I read from another file or a database table that the filespec I am looking for is "filename1_YYMMDDHHMM.dat", or "YYYYMMDD_filename1.dat" where I would then run the command using that filespec to get the oldest set in order to process the sets in date order.

Perhaps the right find command or some awk magic?

I would appreciate any ideas on making this flexible.

Thanks for any info.
Gary

Perderabo · June 2, 2011, 2:55pm

How about this:

$
$ ls -1
20110501_1.dat
20110501_2.dat
20110501_3.dat
20110502_1.dat
20110502_2.dat
20110502_3.dat
20110503_1.dat
20110503_2.dat
20110503_3.dat
k1
$
$
$
$ cat k1
#! /bin/ksh
set -A files -s - *+([0-9])*
filename=${files[0]}
piece=${filename%_*}
echo filename = $filename
echo piece = $piece
exit 0
$
$
$ ./k1
filename = 20110501_1.dat
piece = 20110501
$

Shell_Life · June 2, 2011, 3:19pm

Your requirements can produce a very involved solution.

Since you already know how to sort/head/cut, it seems that your major issue is with the regular expression.

This piece of code will produce a regular expression to display strings with dates in the format YYYYMMDD:

#!/usr/bin/ksh
mYYYY_19xx='19[5-9][0-9]'
mYYYY_20xx='20[01][0-9]'
mYYYY="${mYYYY_19xx}|${mYYYY_20xx}"
m0x='0[1-9]'
m1x='1[0-9]'
m2x='2[0-9]'
m3x='3[01]'
mMM_1x='1[0-2]'
mMM="${m0x}|${mMM_1x}"
mDD="${m0x}|${m1x}|${m2x}|${m3x}"
egrep "(${mYYYY})(${mMM})(${mDD})" input_file

From here you can change this code to include other data manipulations, including other date formats (MMDDYY, DDMMYYYY, etc.).