Delete File in a Directory Using a Condition

shekhar2010us · September 3, 2013, 3:19pm

Hello,

I have a directory with many files whose creation time is distributed all over the day.

I need ANY 20 files per hour. So, I need
20 files for hour 00
20 files for hour 01
...
20 files for hour 23

What I have done so far is not great. Here is the code:

# get the Month
a=`echo $(date) | awk -F" " '{print $2}'`
# get the date
b=`echo $(date) | awk -F" " '{print $3}'`
c="$a  $b"

# loop over hours and minutes to generate all possible combination
for (( hh=01; hh<=23; hh++ ))
do
  for (( mm=01; mm<=59; mm++ ))
  do
    if [ "${#hh}" == 1 ]; then
      h=0$hh
    else
      h=$hh
    fi
    if [ "${#mm}" == 1 ]; then
      m=0$mm
    else
      m=$mm
    fi
    # this is the final pattern for which I need files
    pattern=" $c $h:$m"
    echo $pattern
  done
done

Still this is incomplete. What I want to do next is that: inside the loop, grep with the pattern and delete one but all..... So, I have 1 file per minute.

I hope that there is a much better way to do this. I just need any 'x' number of files per hour.

Thanks

ahamed101 · September 3, 2013, 3:28pm

Didn't quite get your requirement.
You need 1 file per minute, yet you said you select 20 files per hour :rolleyes: and what is this pattern?

--ahamed

shekhar2010us · September 3, 2013, 3:57pm

I need 20 files per hour. But the code I have written is for 60 files per hour.
That's why I told this is not the best solution.

My requirement is:
From a directory having files created at different times, I need 20 files created in the same hour, for every hour.

ahamed101 · September 3, 2013, 4:02pm

So, you need only 20 files per hour for every hour and want to delete the remaining files in the directory?

--ahamed

Don_Cragun · September 3, 2013, 4:21pm

I agree with ahamed101; your requirements are not clear.

Do you want a list of 20 files/per hour from the current working directory for the current date?

Do you want to delete all but 20 files for each hour from the current working directory for today's date? If so, do you care which 20 files are kept?

Note that since you're using today's date, there is no way to guarantee that other files won't be created in hour 23 after you run this script unless you think you can guarantee that this script will run in its entirety during the last clock tick of the day on your system and that no other CPUs will be used to create another file during or after the time when this script runs.

shekhar2010us · September 3, 2013, 4:44pm

Sorry for the unclear requirements . I did not find better words to explain it.

Yes I need 20 files per hour for every hour in the directory and delete the remaining files. The 20 files could be ANY.

---------- Post updated at 04:44 PM ---------- Previous update was at 04:41 PM ----------

Don, The script that I provided will be run everyday, because the file creation is a continuous process.

Don_Cragun · September 3, 2013, 4:56pm

I was just pointing out that if you run your script at 10pm, there may be a lot more than 20 files in the 22 and 23 hours of that day.

Even if you run your script at 2359, files may still be created in the last minute of the day that will leave you with more than 20 files in the 23 hour by the time you get to midnight. And, if this script is run by cron, there is no guarantee that if you schedule it for 2359 (or even 2330) that it will start running before midnight.

ahamed101 · September 3, 2013, 5:05pm

Also, do you want to check the last modified time or creation time of the file?
Getting the creation time of the file may not be straight forward. Is your filesystem ext4?

--ahamed

Don_Cragun · September 3, 2013, 6:32pm

The following seems to do what you want as long as:

you want to use the last data modification timestamp when selecting files,
none of the filenames in the directory where you run this program contain any whitespace characters,
none of the filenames in the directory where you run this program contain any characters that are special to the shell (such as dollar sigh, parentheses, less than and greater than signs, asterisk, and angle, square, and squiggly brackets),
this program does not reside in the directory where you run this program, and
the ls -l output on your system adheres to the requirements set by the POSIX standards (i.e., the 6th field contain the abbreviated month name, the 7th field contains the day of the month, the 8th field contains the time in 24 hour format, and the 9th field contains the filename).

#!/bin/ksh
date "+%b %e" | (read m d
        ls -l | awk -v m=$m -v d=$d '
                $6 != m || $7 != d || c[substr($8,1,2)]++ < 20 { next }
                {printf("rm %s\n", $9)}' |
        ksh -v
)

If you don't want to see the list of files being removed, change the line:

        ksh -v

to:

ksh

I tested this using ksh on Mac OS X, but if you change both occurrences of ksh in this script to the name of any other shell that accepts basic Bourne shell syntax, it should still work. If you are going to run this on a Solaris system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of awk .

I strongly suggest that you replace the rm in red in the script above with echo until you have verified that it does what you want it to do.