find formatted filename with date time

dpath2o · January 21, 2008, 6:39pm

Hi,

I operate and use HF radars along the California coast for ocean surface currents. The devices use Mac OS as the control and logging software. The software generates thousands of files a week and while I've used PERL in the past to solve the problems of finding files I come to realize some inherent inefficiencies with this type of search method. Ultimately I'd like to find files using the unix find command, however, my knowledge of regular expressions is in its infancy and I'm learning. So in my pursuit of this knowledge I'm turning to this forum to hopefully give me some hints.

The files fortunately are all fairly standardize -- i.e. the filename format is:

TYPE_SITE_[YY|YYYY]_MO_MD_[HHMM|HHMMSS].[xx|xxx]

where,
TYPE is a three or four character string depicting the type of data (CSS, CSQ, RDLi, STAT, etc.)
SITE is a four character string depicting the name of the radar site (GCYN, NPGS, BML1, etc.)
YYYY is a two or four digit year
MO is a two digit month
MD is a two digit month day
HH is a two digit hour
MM is a two digit minute
SS is a two digit second
xxx is a two or three character string for a file extension

The main difficulty I have is in crossing years -- i.e. if I wanted to say find CSS files for site BML1 between Dec 29th 2007 and Jan. 2nd 2008. Any advice / pointers on how to attack this using the find or grep or egrep commands would be greatly appreciated.

Thanks,
dpath2o

Smiling_Dragon · January 21, 2008, 8:48pm

Are the files' modification date/time the same as the filename? If so, use find with the -mtime +<days> -mtime -<days> flags to set the min and max age to look between.

dpath2o · January 21, 2008, 10:29pm

No unfortunately not ... File creation, access, and modification times can and often differ from the files stat ... In fact ultimately I should search for the "%TimeStamp: ..." string inside the text files to be most accurate, but once I figure this regex I think I'll be able to apply it to awk or maybe sed to read this line inside the file ... Long answer to your short question!

ghostdog74 · January 21, 2008, 10:52pm

assuming all files end in *.xtx . Using GNUawk

ls *.xtx | awk 'BEGIN { FS="[:_]"
 printf "Enter from date [yyyy mm dd]: "
 getline startymd < "/dev/tty"
 printf "Enter to date [yyyy mm dd]: "
 getline endymd < "/dev/tty"
 s=startymd " 00 00 00"
 stdate = mktime(s)
 e=endymd " 00 00 00"
 edate = mktime(e)
}
{
 find = $3" "$4" "$5" 00 00 00" 
 founddate=mktime(find)
 if ( (founddate > stdate) &&  (founddate < edate) ) {
   print "Found file: " $0
 } 
}'

output:

# ls -1 *xtx
CSQ_NPGS_2008_01_22_120012.xtx
CSS_GCYN_2008_01_18_130022.xtx
STAT_BML1_2007_11_22_110012.xtx
STAT_BML1_2007_12_22_110012.xtx
# ./test.sh
Enter from date [yyyy mm dd]: 2007 12 11
Enter to date [yyyy mm dd]: 2008 01 18
Found file: STAT_BML1_2007_12_22_110012.xtx

pls do more testing on your own

dpath2o · January 21, 2008, 11:19pm

This looks GREAT!

I'm working on it, but at present I get error message:
"awk: calling undefined function mktime source line number 7"

ghostdog74 · January 21, 2008, 11:43pm

that's a gnu awk feature.

dpath2o · February 5, 2008, 10:20pm

OK so here's the regex that I came up with that meets my needs and I use it in a perl script to find files and then parse out file name with the parts I need.

if ($_ =~ m/^($type)[-_\s](\w{4})[-_\s](\d{4}|\d{2})[-_\s](\d{2})[_\s](\d{2})[-_\s]?(\d{2})?(\d{2})?(\d{2})?[.]?(\w{3}|\w{4})?/i) { #SITE_TYPE_YYYY_MO[_HRMN|_HRMNSC]?.[xx|xxx]
        $Ftype   = $1;
        $Fsite   = $2;
	$yyyy    = $3;
        $mm      = $4;
        $dd      = $5;
        $HH      = $6;
        $MM      = $7;
        $SS      = $8;
        $Fsuffix = $9;
        $yyyy += 1900 if $yyyy >= 90 && $yyyy <= 100; #only good to 2090 
        $yyyy += 2000 if $yyyy < 90 && $yyyy;
        $tmpFile = $_;
        $tmpFILE = $File::Find::name;
        $tmpDir  = $File::Find::dir; 
        if ($yyyy < 1980 || ($DirsDont =~ m/$tmpDir/) ) { #don't attempt to perform work on these files
            print "!!! $_ is not understood ... skipping\n";
            return; 
        }
        $tmpT = Mktime($yyyy,$mm,$dd,$HH,$MM,$SS);
}