Smart Backup Script

Hello

Im pretty new to scripting in linux but am keen to learn.

I'm trying to create a backup script similar to a batch script I developed for windows (dos) where the backup is to a usb drive. no problems with the backup process but what I would like to do is automatically remove old files if say less 20 gigabytes (or any hard coded number) is available.

What I did with the windows script was have a if statement check if 20g was available if not remove the oldest backup then check again then remove the oldest again etc. It used goto commands to loop which I know is ugly and there is no equivalent in the linux bash world.

I think i have the core parts of the script down pat which Ill list below but the loop part to go back and check im not sure about. should I be using a for loop?? or something else?

one liner to give me the free space available on usb drive

df |grep usb |awk '{print $4}'

one liner to remove oldest backup file (tgz file)

ls -S *.tgz |tail -n1 |xargs rm

one liner to check if space is less than hard code number (or ideally variable) using the above

if [ $(df |grep usb |awk '{print $4}') -lt 20000000 ]; then ls -S *.tgz |tail -n1 |xargs rm; fi

I know I should be building this all into functions. could someone help with this??

---------- Post updated at 12:05 PM ---------- Previous update was at 09:39 AM ----------

basically after ive sent the last command I posted (all in one line command) how do I then say go back and check again

check if 20G available
>> if not go back, remove another file and check again
>> if so, 20G is available carry on and do the backup

I used to use a goto command in the windows batch script I used but this isn't available in BASH.. I know its possible but am just wanting pointers or example scripts on how to loop this

doing it this way is quite nice as then im not reliant on whatever capacity drive a user may put in it will always make space on the device. with a few other checks in place as well.

A couple of things come to mind looking at your examples.

First, the -S option sorts by size; you'll not be removing the oldest first, but the largest which is likely the most recent, and probably not what you want.

Secondly, I'd switch to the directory that has your backup files in it and use a dot (.) on the df command; this ensures that you are checking the correct device while still allowing it to be variable.

This is the way that I would go about it -- needs only one df and one ls call which will be more efficient:

#!/usr/bin/env ksh

backup_dir=${1:-foo}
if ! cd $backup_dir
then
        echo "cannot switch to backup directory: $backup_directory"
        exit 1
fi
(
        df -B 1 . | tail -1
        ls -rlt *.tgz
) | awk -v need_free=${2:-20} '
        BEGIN {
                need_free *= 1024 ** 3;         # assume 20 on cmd line for 20GiB
        }
        NR == 1 {                               # output from df; adjust need based on available
                need_free -= $4;
                next;
        }

        NF < 5 { next; }                # don't catch total from ls

        {
                if( need_free <= 0 )    # if desired free space reached
                        exit( 0 );
                print $NF;               # no, add this file to the list
                need_free -= $5;     # dec free space needed
        }
'| xargs rm

This is just a basic example. With anything that deletes files I always like to have a 'no-exec' mode that lists what it might do until I'm comfortable that it works right. In any case, you might want to add something that limits the max number of files that it can delete or somesuch that prevents deleting all of the backup files.

The -B option on the df command causes it to list values in bytes; makes it easier. The -rlt option to ls lists files by time, showing size and in reverse order so that the oldest file (by modification time) is listed first. Both the ls and df commands are executed in a subshell so that the output can easily be pushed into the awk that does the real work.

Hope this is useful

thanks for the reply

yeah sorry copied wrong command.. ls command should have -t switch to sort by time.

this is a ubuntu lucid server box

ok to give some stuff im getting. a straight df command gives me

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0             230582364  41795744 177073652  20% /
none                   2020888       220   2020668   1% /dev
none                   2025440         0   2025440   0% /dev/shm
none                   2025440      1708   2023732   1% /var/run
none                   2025440         0   2025440   0% /var/lock
none                   2025440         0   2025440   0% /lib/init/rw
/dev/sdd1            488384032     81104 488302928   1% /media/usb0

im only interested in the usb drive and the free space. (i was trying to maybe use percentage but that will be to complicated). I grep "usb" although I could make this more accurate. so this gives me.

df |grep usb |awk '{print $4}'

which in the above example will give me

488302928

sweet the free space in kilobytes. using this I want to compare to an amount of free space I want to have available, say 20G or ~20000000 kilobytes (clearly its all good at the moment but will fill up)

now the file i want to remove can be identified with

ls -t *.tgz |tail -n1

on whichever the folder im interested in removing files.

Thanks for the example code but I really dont understand a lot of it. not sure what the "next" command is (or is it a variable). and dont see how it checks

the way I have done it before is to basically

: start
check if more than 20G available
>> if false remove oldest backup file and goto start
>> if true, out of loop, carry on with backup

so Ive got the checking if 20G down pat and ive got the removing oldest file down pat but i haven't got the looping.

ive looked at other examples of backup scripts but a lot are looking to just say keep only 4 backups on media etc not dynamically working out how much space is needed.

cheers,