I have a directory with many files whose creation time is distributed all over the day.
I need ANY 20 files per hour. So, I need
20 files for hour 00
20 files for hour 01
...
20 files for hour 23
What I have done so far is not great. Here is the code:
# get the Month
a=`echo $(date) | awk -F" " '{print $2}'`
# get the date
b=`echo $(date) | awk -F" " '{print $3}'`
c="$a $b"
# loop over hours and minutes to generate all possible combination
for (( hh=01; hh<=23; hh++ ))
do
for (( mm=01; mm<=59; mm++ ))
do
if [ "${#hh}" == 1 ]; then
h=0$hh
else
h=$hh
fi
if [ "${#mm}" == 1 ]; then
m=0$mm
else
m=$mm
fi
# this is the final pattern for which I need files
pattern=" $c $h:$m"
echo $pattern
done
done
Still this is incomplete. What I want to do next is that: inside the loop, grep with the pattern and delete one but all..... So, I have 1 file per minute.
I hope that there is a much better way to do this. I just need any 'x' number of files per hour.
I agree with ahamed101; your requirements are not clear.
Do you want a list of 20 files/per hour from the current working directory for the current date?
Do you want to delete all but 20 files for each hour from the current working directory for today's date? If so, do you care which 20 files are kept?
Note that since you're using today's date, there is no way to guarantee that other files won't be created in hour 23 after you run this script unless you think you can guarantee that this script will run in its entirety during the last clock tick of the day on your system and that no other CPUs will be used to create another file during or after the time when this script runs.
I was just pointing out that if you run your script at 10pm, there may be a lot more than 20 files in the 22 and 23 hours of that day.
Even if you run your script at 2359, files may still be created in the last minute of the day that will leave you with more than 20 files in the 23 hour by the time you get to midnight. And, if this script is run by cron, there is no guarantee that if you schedule it for 2359 (or even 2330) that it will start running before midnight.
Also, do you want to check the last modified time or creation time of the file?
Getting the creation time of the file may not be straight forward. Is your filesystem ext4?
The following seems to do what you want as long as:
you want to use the last data modification timestamp when selecting files,
none of the filenames in the directory where you run this program contain any whitespace characters,
none of the filenames in the directory where you run this program contain any characters that are special to the shell (such as dollar sigh, parentheses, less than and greater than signs, asterisk, and angle, square, and squiggly brackets),
this program does not reside in the directory where you run this program, and
the ls -l output on your system adheres to the requirements set by the POSIX standards (i.e., the 6th field contain the abbreviated month name, the 7th field contains the day of the month, the 8th field contains the time in 24 hour format, and the 9th field contains the filename).
#!/bin/ksh
date "+%b %e" | (read m d
ls -l | awk -v m=$m -v d=$d '
$6 != m || $7 != d || c[substr($8,1,2)]++ < 20 { next }
{printf("rm %s\n", $9)}' |
ksh -v
)
If you don't want to see the list of files being removed, change the line:
ksh -v
to:
ksh
I tested this using ksh on Mac OS X, but if you change both occurrences of ksh in this script to the name of any other shell that accepts basic Bourne shell syntax, it should still work. If you are going to run this on a Solaris system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of awk .
I strongly suggest that you replace the rm in red in the script above with echo until you have verified that it does what you want it to do.