Delete big directory issue

Hello folks,

I am deleting a directory with script it is taking 11Hour and also increase the IO on server. I am using below command, inside date directory there are hour directories, which i am deleting after archiving. Archiving is not taking long time, only "rm -rf" is taking alot of time with IOPS, please advise what should do to make it fast with less IO effect.


rm -rf $date


What is the value of $date?

What permissions were your script run with?

Make very sure that your script is only deleting what its supposed to be deleting!

1)date is one day old date 2014-01-14
2) i am running with root user
3) i am deleting correct directories.

If it's taking 11 hours to delete a directory, it's either a very, very full directory, or rm is deleting things you didn't intend.

If it's just a very full directory, rm doesn't have a "go faster" button for that. It's the filesystem that's slow, from having far too many files piled into one folder.

I saw a link and found that we can delete multiple files with this command, is it possible we can do something like this for date directory which is also having subdirectories and we if delete directory proper with less IOPS, how, i am confuse.

perl -e 'for(<*>){((stat)[9]<(unlink))}'

Running multiple rm's at once won't make the delete happen faster because rm is not the thing being slow here.
Using other utilities to emulate rm won't make the delete happen faster because rm is not the thing being slow here.

Your disk and filesystem are performing badly because, probably, far too man files have been crammed into a single directory.

Are there any network-mounted directories in there?

Actual directory structure is like that.


/2014-01-14/0-1/dir1
/2014-01-14/0-1/dir2
/2014-01-14/0-1/dir3

/2014-01-14/1-2/dir1
/2014-01-14/1-2/dir2
/2014-01-14/1-2/dir3

/2014-01-14/21-22/dir1
/2014-01-14/21-22/dir2
/2014-01-14/21-22/dir3

script is deleting 2014-01-14 so it is taking alot of time, total size of one day old log equal to 70GB around and each hour directory size is aound 2.3GB.

---------- Post updated at 02:39 PM ---------- Previous update was at 02:38 PM ----------

Basically is SAN storage assign disk.

Maybe you should consider putting that directory on its own filesystem, then you can delete it in no time at all.

Scott raises a good point about network mounted directories. If there's any issues with those, that could be a bottleneck.

Nonetheless, it's not rm that's misbehaving here.

On operating system have not much space, we have only option for SAN. Is there any possibility to use ionice or nice with cron so it will take low cpu/io utilization and not create load issue on server.

The /bin/rm binary should be the fastest.
Please state your OS and file system type!

uname
df $date

1) Linux RHEL5
2) ext3
3) kernel 2.6
3) currently i am
4) Filesystem is 70% available disk space.


---------- Post updated at 02:54 PM ---------- Previous update was at 02:51 PM ----------

Please check below link, is it possible we can do something for directories too..

How to delete million of files on busy Linux servers (Work out Argument list too long) | Walking in Light with Christ - Faith, Computing, Diary

That isn't relevant. You aren't getting the 'too many arguments' error. That won't solve any problems you're actually having.

I know this isn't what you want to hear, but the best way to handle this problem would have been preventatively -- by not storing your files in such an unwieldy way.

rm doesn't have a "go faster" button. To make it work faster you need faster disks.

If $date is a separate file system, you can consider umount, mkfs, mount.

Is it possible i can run below command on subdirectories in someway and make some loop, but don't understand the logic.

time perl -e 'for(<*>){((stat)[9]<(unlink))}'

---------- Post updated at 03:08 PM ---------- Previous update was at 03:07 PM ----------

Its new server, already mounted file system 2 days ago.

This isn't faster than rm. rm is not "running slow" here. If you wrote a super-efficient delete program in three assembly-language instructions, it'd still be slow.

The problem is your directory. Every time you delete a file, the kernel has to update the list of files. When the list of files is really big, this can take a long time.

I know this isn't what you want to hear but there's no "delete faster" button.

file sizes are in kilobytes, that is txt files

The issue, as stated before, is the number of files not their sizes.

You said directories have about 2.3G of data and files are kilobytes in size so I'm guessing each directory my have 100,000 or more files.

Unix filesystems are not designed to have such large numbers of files in a single directory and performance is suffering.

how can i run below program in one line. I need to test that.

use strict;
use warnings;
use File::Find;
my $dir = "/data/2014-01-14/";
find(\&wanted, $dir);
sub wanted {
unlink glob "*.xml";
}