Date Sorting

Hi,

I have a list of files that take on the format ABCDE_yymmdd and wish to sort them in ascending date order. I can't use the unix time stamp for the file as this could possibly be different from the date given in the file name.

Does anyone know of any way this can be done using unix shell scripting and/or awk(nawk)?

What i already have:

constructFileList()
{

if [ -f $DataDir/$FileListFile ]
then
rm -f $DataDir/$FileListFile
fi

for FILE in `ls -1 $DataDir/ABCDE_??????.dat`
do
echo $(basename $FILE)|nawk -f sort.awk
done |sort|while read FILENAME
do
echo ${FILENAME#??????} >> $DataDir/$FileListFile
done
}

The sort.awk script:

function prisort(STRING)
{
return_str = substr(STRING,11,2) substr(STRING,9,2) substr(STRING,7,2) STRING
return return_str
}

#Main Routine
{
print prisort($1)
}

Thanks

ls PRIML_* | sort -t"_" -k2

Wow, impressive, so much easier than what i was doing thanks anbu23, can you just give a brief explanation of what the -t "_" -k2 does, i would like to understand what I am using so I can build my knowledge.

Edit: I think i get what the -t option is for, to spilt the string to be able to sort on just the date part?

Thanks.

LiquidChild,
For the "sort" command:
-t --> specifies the field separation.
-k --> specifies the field sequence.

If you type: man sort
you can see a full explanation of the "sort" command in your system.

-t char
Use the single character char as the default field separator, instead of the default of whitespace.

-k
Define the sort key field.

This does not seem to work in the following though

FILE_230580.ok
FILE_230590.ok
FILE_010107.ok

As the ordering is then:

FILE_010107.ok
FILE_230580.ok
FILE_230590.ok

$ ls FILE_*
FILE_010107.ok  FILE_230580.ok  FILE_230590.ok
$ ls FILE_* | sort -t"_" -k2
FILE_010107.ok
FILE_230580.ok
FILE_230590.ok

There are two problems :

  • These files seems to be in ddmmyy format.
  • The yy format is not sortable, because with this format 2007 is lower than 1980

In that case you can do something like that:

$ ls File_*
FILE_010107.ok  FILE_230580.ok  FILE_230590.ok
$ ls FILE_* | \
> awk -F_ \
>   '{
>      dd=substr($2,1,2);
>      mm=substr($2,3,2);
>      yy=substr($2,5,2);
>      print (yy<70 ? "20" : "19") yy mm dd, $0 ;
>    }' | \
> sort -k1,1 | \
> cut -d' ' -f2
FILE_230580.ok
FILE_230590.ok
FILE_010107.ok$

Jean-Pierre.

What if the only yy would be above 2000, so 07, 08, 09 etc, would it make the solution simplier? If being honest its unlikely we will have a file pre 2000.

ls FILE* | sort -t_ -k2.5,2.6 -k2.3,2.4 -k2.1,2.2

Jean-Pierre.

Thanks Jean, will give that a go, I tried using multiple keys, but have not seen the '.' notation before, what does that mean?

From sort man pages :

Jean-Pierre.

Thanks for the all the help Jean-Pierre.

I have read the above but am not clear on one thing, I get the .5 .6 which basically equates to the last two fields of the string, i.e. the year, and sorts on them first and the same for .3 .4 and .1 .2, but I don't really understand the 2. and 3. part. While I know this refers to the field and the part after the . refers to the character within the field, I don't understand how it can be 2.5 and 2.6, how are fields split up in the string, i.e. what exactly is field 2 and field 3.

Sorry I did read the above and have had a look for some further text and examples and while a bit clearer I still don't fully understand it

I have corrected typing errors :

ls FILE* | sort -t_ -k2.5,2.6 -k2.3,2.4 -k2.1,2.2

With '_' as field separator, the date is field 2 is in the form ddmmyy.ok
-k2.5,2.6 : Chars 5 to 6 of field 2 -> yy
-k2.3,2.4 : Chars 3 to 4 of field 2 -> mm
-k2.1,2.2 : Chars 1 to 2 of field 2 -> dd

Jean-Pierre.

Ok, got it! Thanks again.