I'm sure this has been asked many times, but a search didn't turn up a definitive best method for this (if there ever is such a thing).
I have been using rsync to back up my main data directory, but I have accumulated a large number of older backups that I don't need. All of the files I don't need anymore have the extension .back, so I need to troll through all of the folders and sub-folders and delete everything with the .back extension. I thought I would need to do some kind of recursive ls and pipe the results to rm, but I'm not sure what that would look like so I did a search.
Many of the solutions I found use find and look like,
As usual, there appear to be many ways of doing things and I have no basis on which to make a choice. These files are copies, so I could always rebuild the backup if there was a disaster with the cleaning, but that would take time and I try to avoid putting my foot in it to that extent.
Alright, I will give that a go. If you have a minute to answer, what is the difference between the method you posted and the other examples I gave in my original post?
IMO, the first command will not work because the o/p of 'find' command is not piped in any way to the exec command. But the second and third should work for sure.
I wasn't sure about the -delete' action. That is why I gave you the command I am familiar and experienced with.
Oh dear, I'm classing myself as a geek. Well, if the name fits, ......
The way you have tried, the shell will expand *.back before trying to run the command. If you happend to have a file at the top level called this.back then the command actually run will become:-
so you will not actually match anything other than the file at the top level.
The others are various errors. What jaiseaugustine has suggested is the correct format for you. It will pass in *.back as it is to the find command and then it can be used for pattern matching.
If there are no files at the top level, you might get away with it depending how your shell reacts, but if there is more that one file called *.back, e.g. this.back & that.back, then you will probably get the error:-
This is more or less always what I end up doing and I guess it is a reasonable way to proceed in most cases. I keep notes on what I have used for various situations, especially those methods that worked well.
The command,
find . -name "*.back" -type f -exec rm -f {} \;
worked well and cleared out about 50GB or older incremental versions. I ran this while I was out for a while and I didn't run it under time, so I can't comment on how fast the method is compared to other possibilities. I generally presume that there is no fast script based method to process a directory tree with 4+ million files.
I also did a defrag/optimize (auslogics) and clean out of MFT records. All told it took almost 24 hours to run, but I find I need to keep these backups well maintained, or they eventually bork and you have to reformat and start again. It seems as if rsync tends to lead to very fragmented repositories. I have never quite understood why you get lots of fragmenting on a drive with 500GB of empty space.
Thanks for all the additional explanations. I do always try to understand what a script is doing and why you would choose one method over another. I think I need to read a bit about exec.
As MadeInGermany informs us that the -delete is available in newer versions, then that will probably run better than -exec rm {} \; as the latter will spawn a new process for each file, and that in itself will take a small amount of time. Multiply by perhaps 100,000 hits suddenly becomes a lot of time spent just creating a new process for each delete.
To my embarrassment, most of my servers are rather behind the times (AIX 4.3.3 for some) so I've only got this flag in RHEL 6.3.
Interestingly in the RHEL man page, I found this:-
Perhaps a neater way (with examples) of what I was trying to say earlier.
Glad that we could collectively help, and I've learned something too
Yes, this is windows XP, but I do the heavy lifting with cygwin. Nothing beats a linux tool box for large scale file operations. I suppose there may be a dos equivalent, but I never bothered to learn dos when I could use cygwin and learn a real shell like bash instead.
My own experience with rsync (via rsnapshot): my workstation backup is run 6 times per day. I have been running it for more than 1 year. The numbers from fsck are:
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 241G 35G 194G 16% /media/big_disk_1
big_disk_1: 1,285,708/16,007,168 files (0.3% non-contiguous), 8984671/64000944 blocks
So very low fragmentation for a disk that sees a fair amount of activity. This does not necessarily parallel your use, but I have never used a defragmentation code on any Linux filesystem. When I used W2K, I seemed to need a defragmentation run quite often.
That's a very inefficient method of deleting a large number of files with Cygwin. The fork required to delete each file performs very, very poorly. Either use -delete or the + version of -exec.
To reduce the calls to programs (like rm), there are two implementations
Unix, later defined by Posix
find . -name "*.back" -type f -exec rm -f {} +
I.e. you replace \; by + and the program (here rm) must accept multiple arguments.
This implementation seems difficult; I have met some buggy ones.
2. GNU find, by means of the xargs program that converts an input stream to multiple arguments:
Note the corresponding -print0 and -0 ; without them there is incorrect handling of filenames with space characters.
--
A demonstration of quoting types: