Need script to remove millions of tmp files in /html/cache/ directory

Hello,

I just saw that on my vps (centOS) my oscommerce with a seo script
has created millions of tmp files inside the /html/cache/ directory.

I would need to remove all those files (millions), I tried via shell but the vps
loads goes to very high and it hangs, is there some way to do a bash script to:

delete the files little by little by checking the load of the VPS and balance this operation

Thanks :slight_smile:

cd /html/cache
pwd

# You must confirm you are under the right folder, because below command will clean all files from the current folder.

find . -type f -exec rm -f {} \;

Basically that command will overload the vps and hang it. :wall:

What I was looking for was something that could balance the server load and remove those files.

Thanks :slight_smile:

You could sleep after deleting some files.

#!/bin/bash
let I=1
find /html/cache -type f -print |while read FILE_NAME
do
 rm $FILE_NAME
 let I++
 if [ $I -ge 1000 ]; then
  sleep 1
  let I=1
 fi
done

The "hang" you're describing: is it just that the console doesn't produce any output for quite some time, or does it really stop? If it stops, how did you check this?

If it's really, really stopping you could try reducing the I/O load by using man ionice (Linux), eg. ionice -c 3 find . -type f

Also, if you use \+ instead of \; to terminate the -exec command to pass multiple filenames at once, instead of invoking the command for each file.

Can we get a more exact sizing of the problem:

How big is the directory file itself?

ls -lad /html/cache

How long does it take to traverse the tree? And how many files in the tree?

date ; find /html/cache/ -type f -print | wc -l  ; date

Can you expand a bit about your CentOS VPS?
Are you renting a virtual computer from somewhere on the internet, or is this something you manage yourself?

If this is long-distance, the volume of output from commands must be important, but I can't see how running file deletes can hang the VPS. This is assuming that there is no rollback facility or something which would be crippled by high volumes of file deletes. I must assume that it did not collapse when the files were created - which is surely a similar load.

Assuming /html/cache is a simple directory structure and can be quiesced with no files open I'd be tempted to rename the directory, create a new replacement with identical permissions, and then delete the original at my leisure.

I think it hangs because first it try to find/list all files within folder and then operate... if you are fine to delete that folder itself and create a new one that would speed up...

---------- Post updated at 07:22 PM ---------- Previous update was at 07:18 PM ----------

#!/bin/bash
cd /html/
mv cache cache.bak
mkdir cache
rm -rf cache.bak

find /html/cache | xargs rm

is more efficient than doing

find /html/cache -exec rm {} \;

since the -exec will spawn a new process on each file, whereas xargs will launch one process for many.