Hi Expert's,
I need your assitance in tunning one script. I have a mount point where almost 4848008 files and 864739 directories are present. The script search for specific pattern files and specfic period then delete them to free up space. The script is designed to run daily and its taking around 3 complete days to complete. So the task of tunning came to me.
Initial the Script has 43 find command to delete files 5 find commands to delete the empty directory.
So I got only 7 find command for files and one command for directory. I think(not tested the approach yet) this reduce the search and reduce the time too.
2) Since the first one has -exec command along with the find command what i think is it will take more time, So second approach what I have is finding the files which i need to delete and the delete it with the below loop.
find ${PD} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
-o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
-o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -print > remove.log
cat remove.log | while read ENTRY
do
if [ -f $ENTRY ]; then
rm -f $ENTRY
elif [ -d $ENTRY ]; then
rmdir $ENTRY
fi
done
So, What I request is please let me know the pros & cons on approach 1 & 2. Also please let me know find -exec will take more time or not.
Thanks
Senthil
combining search patterns into one find command is a good idea.
Storing the filenames into a file and then looping through the contents of the file is slower than doing -exec, so unless you want to keep a log of what was deleted, it's reduntant.
Faster than doing -exec would be piping the output of find to xargs(1) like this:
find $PD <all options you need> | xargs rm
which would call rm only once for many files, as opposed to -exec, which will invoke rm for every file.
Calling find on a mountpoint is not ideal -- if at all possible, i'd recommend running the same find command on the machine that physically contains the filesystem.
I would still go with option 1. However if the list of files to be removed is very large then you might get an error thats bevause it might go beyond the string which can be handled by the rm command.
-exec won't create any problem. Its as good as runing rm command.
@mirni/vidhyadhar
Since I'm combining the find commands by mtime(now its came only 7 mtime) so the removal list wont come big and I am reomving that each and every time. Also if I use the xargs at last is it fine?
Can you please replace this line : -o -name '*(MON)' with below code
-o -name '*(MON|TUE|WED|THU|FRI|SAT|SUN)'
Hope this works 4 u o
---------- Post updated at 05:29 AM ---------- Previous update was at 05:25 AM ----------
I think , by having xargs in command will add burden on tunning, since first it will add files in buffer then it will remove where as in direct command , it will keep removing once it finds the file/dir.
@Mirni,
By the statement from mann2719 xargs will delay the process, so shall I use the -exec flag? @mann2719,
For Mtime +14 I'm having 9 seraching pattern, Shall i combine them in one like
xargs will not burden anything. The difference between -exec and |xargs can be significant if there are a lot of arguments, with xargs being the winner. I already said this in the previous post, but let me re-iterate in more detail:
find . -exec rm {} \;
will fork() a process for each file. If find returns a million files, you will end up with a million rm commands. This is much more expensive than doing
find . | xargs rm
because this construct will run rm only once for many files; for how many depends on your system -- it's defined in limits.h.
Try it for yourself if you don't believe me:
$ ls | wc
37883 37883 367719
$ time find . -maxdepth 1 -type f -exec cp {} dump \;
real 1m16.008s
user 2m0.508s
sys 0m37.818s
$ time find . -maxdepth 1 -type f | xargs cp -t dump
real 0m1.197s
user 0m0.268s
sys 0m0.712s
Newer shells also provide a '+' version of -exec, which basically does the same as xargs -- feeds the command as many arguments as it can:
$ time find . -maxdepth 1 -type f -exec cp -t dump {} +
real 0m1.050s
user 0m0.256s
sys 0m0.660s
New script takes only 10 hours and 35 min 23 seconds. But it takes always 15 -21 % of CPU. Earlier its too less than 10 , so any thought on this.?but still testing on with exec + will post the final result to all.
construct, which uses NULL to separate the arguments, and whitespaces are no longer special. You can look into that, if it's supported on your system. I recommend doing a benchmark yourself, since it may vary from system to system, but I wouldn't expect to see huge differences in performance.
The -I{} solution ought to work with the same performance unless you have a newline in a filename. If you do, only print0 can handle that without a hitch.
Hi admin,
When i Used the above xargs command I'm getting below error "xargs -I{} rm {}"
xargs: Missing quote:
When I checked this the file name contains double quote and some contains single quote. How to overcome this. Do i will go back to -exec rm command again with + symbol?
Thanks
sample file names
./D�PENSES PAR INDUS. THALES GROUP "EURO"-THALES (CYCLIQUE).HTML
./PRINCIPAUX FOURNI'S. THALES (CYCLIQUE).HTML