Shell program question with Filenames and dates

Hi Unix Gurus

Let's say I have the input files like the following. I need to pick the files based on my run date.

abcd_20180206.csv
abcd_20180213.csv
abcd_20180220.csv
abcd_20180227.csv
efgh_20180206.csv
efgh_20180220.csv
efgh_20180227.csv
ijkl_20180206.csv
ijkl_20180213.csv
ijkl_20180220.csv
ijkl_20180227.csv
mnop_20180206.csv
mnop_20180213.csv
mnop_20180220.csv
mnop_20180221.csv
mnop_20180227.csv
qrst_20180206.csv
qrst_20180213.csv
qrst_20180220.csv
qrst_20180227.csv
uvwx_20180206.csv
uvwx_20180213.csv
uvwx_20180220.csv
uvwx_20180227.csv
yyzz_20170606.csv

So if I am running for 20180206 or 20180209 or 20180212 , The output of the above should be:

abcd_20180206.csv
efgh_20180206.csv
ijkl_20180206.csv
mnop_20180206.csv
qrst_20180206.csv
uvwx_20180206.csv
yyzz_20170606.csv

And if I am running for 20180213 or 20180215 or 2018019 , The output of the above should be:

abcd_20180213.csv
efgh_20180206.csv
ijkl_20180213.csv
mnop_20180213.csv
qrst_20180213.csv
uvwx_20180213.csv
yyzz_20170606.csv

Also if I am running for 20180221 , The output of the above should be:

abcd_20180220.csv
efgh_20180221.csv
ijkl_20180220.csv
mnop_20180220.csv
qrst_20180220.csv
uvwx_20180220.csv
yyzz_20170606.csv

Can someone please help me with this?

Thanks & regards,
SK

This is far from clear. Does it mean "for each of the groups defined by the first four characters, select the file with date stamp less or equal to the date parameter string given"?

The first few characters of the file name can vary. Few file names can have 8 letters and others may have 80 letters.

However it is the last 8 characters representing the date (before .csv) and to take the latest available dated file based on the run date.

You don't seem to bother much explaining your problem. Why should anyone in here care more about your problem than you do?

May be I am not phrasing my words correctly here. I will have bunch of files with various versions of each file. For example, my list may contain:

abcdefgh_20180102.csv
abcdefgh_20180120.csv
xyz_20180121.csv
xyz_20180102.csv

So if I am passing a variable of ( 20180118 ) should give me:

abcdefgh_20180102.csv
xyz_20180102.csv

and the variable ( 20180121 ) should give me:

abcdefgh_20180120.csv
xyz_20180121.csv

The timestamps of the files has to be ignored as the filenames can be past dated with current timestamps. So I should only look at the filenames and the last 8 characters of the filename representing the date.

Sorry if I am not able to explain clearly.

Regards,
SK

With the "run date" as the first positional parameter, and assuming you are using a not too old bourne compatible shell, try

ls *.csv | {
while IFS="_." read GR DT XT
  do    [ ! "$GR" = "${OGR:-$GR}" ] &&          { echo ${OGR}_${ODT}.${OXT}
                                                  ODT=""
                                                }
        [ "$DT" -gt "$1" ] && [ "$ODT" ]   ||     ODT=$DT
        OGR=$GR
        OXT=$XT
  done
echo ${OGR}_${ODT}.${OXT}
}
1 Like

Thanks a lot for your reply. I will try this.

Perhaps i'm missing something in your requirement, but this script will print files in the current directory for the date string passed as argument #1 (defaults to today's date if no parameters passed):

DT=${1:-$(date +%Y%m%d)}
for file in *_${DT}.csv
do
   [ -f "$file" ] && echo "$file"
done
1 Like

Hi

I had tried this, but I think the code is reading the string after the first occurrence of "_" as the basis to identify the date. However if few of my files has the naming convention of following, how do we deal with it?

abcdefgh_20180102.csv
abcdefgh_20180120.csv
xyz_20180121.csv
xyz_20180102.csv
abcdefgh_xyz_pqr_20180102.csv
abcdefgh_xyz_pqr_20180109.csv
abcdefgh_65325_parent_20180120.csv
abcdefgh_65325_parent_20180127.csv
xyz_34567_filter_20180121.csv
xyz_34567_filter_20180128.csv

Thanks & regards,
SK

Untested

ls *.csv  |
{ while read LINE    
    do  GR=${LINE%_*}
        DT=${LINE#${GR}_}
        DT=${DT%.*}
        XT=.${LINE##*.}
        [ ! "$GR" = "${OGR:-$GR}" ] &&          { echo ${OLN} 
                                                  ODT=""   
                                                }          
        [ "$DT" -gt "$1" ] && [ "$ODT" ]   ||     ODT=$DT
        OGR=$GR
        OLN="$LINE"  
  done  
echo ${OLN} 
}
1 Like