Script to tar/rsync/rm multiple folder names

hi all,

i attach a link with what im trying to do automatically via script but i have some questions i need answering please, bear in mind i am really new to bash scripting, the only thing i know how to do is commands in scripts like cd rm tar rsync cp stuff like that

i have mutiple project folders in the "to_be_archived" folder ie

batman
superman
hulk
spiderman
iron_man
etc etc...

so it makes a tar file of the folder, rsyncs tar file to another folder "archived_projects" and deletes the tar file and folder in "to_be_archived"

my example i just did a test with a untitled folder to check to see if it works and it does

i want it to -

do all folders in the list and not just one specific folder, so once it does the 1st one it does the 2nd one so on so forth

i want to check before it deletes the folder and tar file in the (to_be_archived) i want to double check if it has rsynced the whole file across to (archived_projects)

cheers,

rob

Welcome to the forum. There is no problem with being a beginner. We are here to help you (learn). How about i give you some pointers and you try to fill in the rest?

I take it, you want a separate tar-file for each folder, i.e. a batman.tar , a superman.tar , and so on, yes?

If so: this is really easy to set up by using a loop. Here is how to do it:

for DIR in /some/dir/to_be_archived/* ; do
    echo $DIR
done

This will cycle through all directory entries of /some/dir/to_be_archived/* (in scripts it is always better to use absolute paths instead of relative ones) and set the variable "$DIR" to each value for every run through the loop. As it is there is only one command - echo $DIR - but it shows the mechanism.

Because you perhaps want to use the last part of the name for the naming you need to extract this first into another variable and then use this:

for DIR in /some/dir/to_be_archived/* ; do
    echo full DIR is: $DIR
    echo dir name : ${DIR##*/}
done
full DIR is: /some/dir/to_be_archived/batman
dir name : batman
full DIR is: /some/dir/to_be_archived/superman
dir name : superman
full DIR is: /some/dir/to_be_archived/hulk
dir name : hulk
[....]

Now let us construct some commands around this:

cd /some/dir/to_be_archived
for DIR in /some/dir/to_be_archived/* ; do
    fSaveDir="${DIR##*/}"
    echo tar cvf /some/dir/to_be_archived/${fSaveDir}.tar ./${fSaveDir}
done
tar /some/dir/to_be_archived/batman.tar ./batman
tar /some/dir/to_be_archived/superman.tar ./superman
tar /some/dir/to_be_archived/hulk.tar ./hulk
[....]

If this is what you want, remove the "echo" so that the command - instead of being displayed - is executed. The same way you can add more commands inside the loop and use "${fSaveDir}" whereever you would enter the directories name.

This is understandable, but there is a better way to do this: UNIX-commands always set a "return code" upon their exit. This return code is 0 (zero) when the command was successful, something else when not. You can simply check this return code after each command and do something (write an error message, exit the script, ....) if it isn't 0. There is a special variable "$?" for this, but you can use an "if"-statement as well (i suggest you try it out with a few commands). First the command version, then the same within an if-statement:

# ls /etc/hosts ; echo $?
/etc/hosts
0

# ls /some/file/which/doesnot/exist ; echo $?
/some/file/which/doesnot/exist not found
2
if <some command> ; then
     echo "this command returned 0."
else
     echo "this command returned non-zero."
fi

This will make your script look like this, with a possible outcome below:

cd /some/dir/to_be_archived
for DIR in /some/dir/to_be_archived/* ; do
    fSaveDir="${DIR##*/}"
    if tar cvf /some/dir/to_be_archived/${fSaveDir}.tar ./${fSaveDir} ; then
          echo "tar-command for ${fSaveDir} successful"
    else
          echo "tar-command for ${fSaveDir} failed, aborting."
          exit 2
    fi
    if <next-command> ${fSaveDir} ; then
          echo "<next-command> for ${fSaveDir} successful"
    else
          echo "<next-command> for ${fSaveDir} failed, aborting."
          exit 3
    fi
    if
        [....]
    fi
done
tar-command for batman successful
<next-command> for batman successful
<other-command> for batman successful
[...]
tar-command for superman successful
<next-command> for superman successful
<other-command> for superman failed, aborting.

This would not only tell you what exactly worked but also where the script failed.

If you still have questions feel free to ask.

I hope this helps.

bakunin

1 Like

thank you so much for your help, this is a great template to learn of

i need to figure out what each syntax you put does, i know what some mean

yes i do want a seperate tar file for each folder (ie whatever project folder is in the to_be_archived i want it to have that folder name.tar)

what about the rsync and rm commands in my screenshots?

cheers,

rob

---------- Post updated at 10:51 AM ---------- Previous update was at 09:30 AM ----------

getting there

#!/bin/bash
cd /to_be_archived/
for DIR in * ; do
fSaveDir="${DIR##*/}"
tar -cf "${fSaveDir}".tar "${fSaveDir}"
rsync -a "${fSaveDir}".tar /archived_projects/
rm -f "${fSaveDir}".tar
rm -rf "${fSaveDir}"
done

how did you know this command

fSaveDir="${DIR##*/}"?

Excellent!

Just a few points you might want to observe as you go along:

First, it is common style to indent code: every conditional statement and every looping statement triggers one (more) level of indentation. The reason is that it lets stand out the loop body and the conditionally executed statements. Software (any software - even 5-line-scripts) is written to be maintained easily and this helps getting faster what code does. Suppose you write your script, don't look at it for some months and then want to change something: you won't have its "inner organisation" as present as you have it now.

Another thing is: like you organize a longer text into paragraphs to make it easier to read you can put empty lines into the code to group parts of the commands. I would have written your script this way:

#!/bin/bash

cd /to_be_archived/
for DIR in * ; do
     fSaveDir="${DIR##*/}"

     tar -cf "${fSaveDir}".tar "${fSaveDir}"
     rsync -a "${fSaveDir}".tar /archived_projects/
     rm -f "${fSaveDir}".tar
     rm -rf "${fSaveDir}"
done

Second: "tar" is a command which takes subcommands. "c" and "f" are such subcommands, not options. They are therefore NOT introduced with a dash. Yes, this is inconsequent and, yes, most tar -versions tolerate the dashes anyway because the misuse is so common, but still: correct is this:

tar cf /target/file.tar /directory/to/process

and not this:

tar -cf /target/file.tar /directory/to/process

Finally: your script relies on a certain environment being set. enter env at the commandline and you will see many variables being set to certain values. All these variables are set also inside your script when it runs. You rely, for instance, on a certain value of "PATH", because tar is usually /usr/bin/tar . This implicitly set environment may not always be there, especially when you put this script into cron to have it executed automatically from time to time. Do yourself a favour and set the environment you need explicitly:

#!/bin/bash

PATH="/usr/bin:/usr/local/bin:<or whatever else you may need>"
export PATH
<other variables you may need inside your script>

cd /to_be_archived/
for DIR in * ; do
     fSaveDir="${DIR##*/}"

     tar -cf "${fSaveDir}".tar "${fSaveDir}"
     rsync -a "${fSaveDir}".tar /archived_projects/
     rm -f "${fSaveDir}".tar
     rm -rf "${fSaveDir}"
done

It is called "parameter expansion" or "variable expansion" and you will find it (like most other things i touched upon) in the man page of your shell. Enter man bash at the command prompt or do a google search.

This specific expansion is "##" which means: take the pattern following after that and remove the left part of the variables content matching the pattern: because "*/" matches "everything up to the last "/" - which is everything save for the directories name. There is also a method for cutting from the rightmost part of the content, see below:

Try the following at the commandline:

x="a/b/c/d/e"            # our content to play with
echo "${x##*/}"         # gives you "e" - the longest possible match for "*/" is cut off leftside
echo "${x#*/}"         # gives you "b/c/d/e", the shortest possible match
echo "${x%/*}"         # gives you "a/b/c/d", the shortest possible match for "/*" is cut off rightside
echo "${x%%/*}"        # will give you "a", the longest possible match for "/*"

I hope this helps.

bakunin

thank you bakunin

i understand why you put the executable paths in but what is this line for -

export PATH

also when i do this command i get a text file of the project folder name and size of each folder but how do i make it do file counts of each directory?

du -h /to_be_archived/* >> /archive

and i will add this code just before and after the lines

cd /to_be_archived/
i will put it here
for DIR in * ; do

many thanks,

rob

It means: if this shell happens to open another shell (for instance by calling another script) the current value of the export ed variable will be known there. Right now there is no such other shell called so this is not necessary. But in case you do call another shell at some time in the future it will be good to be there so you can't forget to add it. Again: maintainability at work.

The same way you cycle through the subdirectories of "/to_be_archived": using the for-loop for tarring, rsyncing and everything else. Like this (i put a header line in for every directory to make it easier to read - it serves no other purpose so feel free to modify the format or drop it alltogether if you don't like it):

cd /to_be_archived/
for DIR in * ; do
     fSaveDir="${DIR##*/}"

     echo "------------- files from ${fSaveDir}" >> /archive
     du -h /to_be_archived/"${fSaveDir}"/* >> /archive

     tar [....rest of your code....]
done

I hope this helps.

bakunin

aahhh this programming bash script language is over my head, can you please advise of any good how tos for complete dummys as this is all going over my head

sorry

---------- Post updated at 11:50 AM ---------- Previous update was at 08:08 AM ----------

im messing about with the error code now, i want to see if its possible to email me if a certain command fails

---------- Post updated 04-15-16 at 05:37 AM ---------- Previous update was 04-14-16 at 11:50 AM ----------

rsync -a "${fSaveDir}".tar /archived_projects/

new code goes here

rm -f "${fSaveDir}".tar

inbetween these two lines of code i want it to either continue the script if the last command returned a code 0 (aka successful) but if it returns any over number, quit the script and email

i know this is possible but i dont know how to implement it

See my post #2 in this thread, where exactly this is explained. Replace the "echo"-commands i put in to show the mechanism with some other commands, namely the mail -command if you want to send an email. (And, as always, see man mail for details about how to use the command.)

I hope this helps.

bakunin

awesome its looking really good now -

#!/bin/bash
cd /to_be_archived/
for DIR in * ; do
fSaveDir="${DIR##*/}"
tar -cf "${fSaveDir}".tar "${fSaveDir}"
rsync -a "${fSaveDir}".tar /archived_projects/
if [ $? -ne 0 ]
then
mail -s "${fSaveDir}" robertw@molinare.co.uk <<< "project "${fSaveDir}" aborted due to error"
else
rm -f "${fSaveDir}".tar
rm -rf "${fSaveDir}"
fi
done

few questions tho,

lets say i have tons of projects in the "to_be_archived" folder and the script runs and it fails on a job number 2 out of 20 does that mean the script will exit out completley or will it carry on with the other folders?

few answers:

first, the script will continue because it will exit only if you tell it so look at its code (properly indented, as i told you to do) and its structure becomes clear (i have shortened some lines to not overflow the screen:

#!/bin/bash
cd /to_be_archived/
for DIR in * ; do                V do this for every value of DIR:
     fSaveDir="${DIR##*/}"       |
     tar -cf [...]               |
     rsync -a [...]              |
     if [ $? -ne 0 ] ; then      |  V do this if the previous cmd failed:
          mail -s [...]          |  |
     else                        |    and this, if not:
          rm -f [...]            |  |
          rm -rf [...]           |  |
     fi                          |
done

You see what it does - it will do exactly that, nothing more, i promise! If you don't tell it to exit under cetain circumstances, it won't (should you want that: put an "exit" command in somewhere).

For a beginner you have done very good work. (sorry, if that sounds condescending, it isn't meant that way. Programming is like playing chess: you spend half an hour to learn the rules and then a lifetime to really play the game well.)

Now for some of the finer points: the variable "$?" is set anew after each command. This means that this part:

tar -cf "${fSaveDir}".tar "${fSaveDir}"
rsync -a "${fSaveDir}".tar /archived_projects/
if [ $? -ne 0 ] ; then
     [....]

might not exactly do what you want: suppose the tar-command encounters a problem. It would exit, set an error code which you do not observe, then the rsync-command will start and eventually do its job and you will never be notified about the problem the tar-command had. Even worse, the rsync-command would overwrite the last good copy you had in the archive with the faulty tar-file you just produced.

A big part of the art of script programming is to foresee what could possibly go wrong and take measures against that. You do not necessarily have to program a solution to a problem, but you want to become aware of it!

If a script has to do a, b, c and d you want to see as outcome that it did a and b, couldn't do c (ideally by giving the reason why) and therefore did not even try d. The last thing you want is the script to not recognize it didn't do c, attempt to do d (which would make no sense at all because it builds on the result of the previously run c), eventually "succeed" doing it (but with a completely unusable result) and telling you "all done successfully".

Look back at my model script from post #2: i looked at the return code of every command and put different error information for each command failing in. You do not have to exit the script as i did, but you should do something about a failing command and most probably stop the processing of the part at hand.

I hope this helps.

bakunin

Makes sense, so if i dont put an exit command it will continue regardless?

So really i want an exit line here

mail -s "${fSaveDir}" robertw@molinare.co.uk <<< "project "${fSaveDir}" aborted due to error"
EXIT
else

and also i want to do another if, then, else statement for the tarring?

lets say i have multiple project folders in the "to_be_archived" folder and this script runs on the 1st folder and produces an error code for the tar and emails me and exits

will it exit the script, or will it carry on and do the 2nd project folder?

If you're in a loop (such as for or while or until ) and you want to exit the script containing the loop, use exit .

If you're in a loop and you want to stop processing the loop and continue with program steps after the loop, use break .

If you're in a loop and you want to stop processing the current iteration of the loop and continue with the next iteration, use continue . This is only needed if there are other steps in the current iteration of the loop you want to skip. The default behavior when hitting the done at the end of a loop is to continue with the next iteration of the loop if there are additional values to be processed by the loop.

Can i just use an if statement and not an if or else statement,

If / Else Statements (Shell Scripting) - Code Wiki

If you only care about what happens when the if condition is true and don't want to do anything if the condition is false, then you don't need the else clause in your if statement.

so it will continue with my script even if i dont specify an ELSE statement and just use an IF statement, unless i include in my IF statement an EXIT statement?

because all im going to care about is if it doesnt succed in doing the tar, rsync and, rm commands so im just going to do if doesnt equal 0 email me and exit the script

---------- Post updated at 07:00 PM ---------- Previous update was at 05:27 PM ----------

using just the if statements for all the commands im doing as i want to to notified if any commands fail but the last command i have done a else aswell

its working great

#!/bin/bash
cd /to_be_archived/
for DIR in * ; do
fSaveDir="${DIR##*/}"
tar -cf "${fSaveDir}".tar "${fSaveDir}"
if [ $? -ne 0 ]
then
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "creating of tar "${fSaveDir}" failed due to error"
exit
fi
rsync -a "${fSaveDir}".tar /archived_projects/
if [ $? -ne 0 ]
then
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "rsync of "${fSaveDir}" failed due to error"
exit
fi
rm -f "${fSaveDir}".tar
if [ $? -ne 0 ]
then
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "removing of tar "${fSaveDir}" failed due to error"
exit
fi
rm -rf "${fSaveDir}"
if [ $? -ne 0 ]
then
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "removing of "${fSaveDir}" failed due to error"
exit
else
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "successfully completed archiving "${fSaveDir}""
fi
done

A few notes on your code and a possible alternative for you to consider...

  1. Learn to indent your code as bakunin has suggested several times. It not only looks more professoinal, it makes it easier for anyone (including you) to read and understand what your code is trying to do, makes it easer to spot missing syntax elements (like a do or a done or a then or a fi ). I have a coding style that is a little bit different form bakunin's style. I don't care what style you use as long as you pick one and use it consistently.
  2. Since you cd into the directory containing the directories you want to process and use for DIR in * , $DIR will never expand to a string containing a <slash> character and, therefore, fSaveDir will always contain exactly the same string as DIR . Therefore, I have gotten rid of fSaveDir and just use DIR .
  3. Since you are creating tar archives in the directory containing your project directories, and you don't remove them if something fails, you need to verify that the file you are processing ( $DIR ) is a directory. There are several ways to do that. I have chosen to simply test for non-directory file in an if statement and use a continue to silently skip over non-directory files.
  4. The sendmail utiltiy (at least on my system) does not treat its 1st operand as a subject line (and has no option to set a subject line on the command line). I don't see why you would want a separate e-mail message for each project processed, and I don't see why you would want to stop all processing if you hit one error. The code below uses mailx instead of sendmail and only sends one message containing the status for each project directory processed by one invocation of your script.
  5. As noted before by bakunin, c and f are not options to tar and should not be preceded by a hyphen.

Please note that none of this has been tested in any way, but I think it comes close to doing what you are trying to do:

#!/bin/bash
cd /to_be_archived/
for DIR in *
do	if [ ! -d "$DIR" ]
	then	# This file is not a directory, skip to next file...
		continue
	fi
	printf 'Processing proejct: %s\n' "$DIR"
	if ! tar cf "${DIR}".tar "${DIR}"
	then	printf 'Creating "%s.tar" failed.\n' "$DIR"
		continue
	fi
	if ! rsync -a "${DIR}".tar /archived_projects/
	then	printf 'rsync of "%s.tar" failed.\n' "$DIR"
		continue
	fi
	if ! rm -f "${DIR}".tar
	then	printf 'Removing "%s.tar" failed.\n' "$DIR"
		continue
	fi
	if ! rm -rf "${DIR}"
	then	printf 'Removing project "%s" failed.\n' "$DIR"
		continue
	fi
	printf 'Project '%s" successfully archived and removed.\n' "${DIR}"
done 2>&1 | mailx -s "${0##*/} Status report $(date)" robertkwild@gmail.com

But surely i dont want to continue the script if it fails on the taring rsync or removing

Thats why i put an exit statement instead so if fails it can let me know via email and exit the script

Also lets say i have multiple folders i want the script to run lets say it exits on the first tarring of folder 1, will it try and do folder 2 and so on so forth or will it exit altogther

Please look at post #12 in this thread again!

If you have five projects to archive and remove, and the tar on the first one fails; if you use exit , your script will exit and there will be absolutely no attempt to tar , rsync , and rm the other four projects. That is why I used continue instead of exit . Using continue , the script I suggested will stop processing the first project (if any step fails) but will then continue processing other projects until all five projects have been processed. When it is done processing all five projects (successfully or unsuccessfully), it will send you one e-mail giving you the status of all five projects (including the normal and diagnostic output produced by each command run in the script for each project). Isn't that what you want to do?

ok thanks,

i thought exit would exit folder 1 if it failed any steps and continue doing folder 2

so when you say continue, it wont continue with folder 1 if it failed any steps but instead it will go to folder 2 and once its done all 5 it will give you an email of all the folders its succedded or failed?

so my coding

tar -cf "${fSaveDir}".tar "${fSaveDir}"
if [ $? -ne 0 ]
then
sendmail "${fSaveDir}" robertkwild@gmail.com <<< "creating of tar "${fSaveDir}" failed due to error"
exit
fi

if i change the exit to a continue, will it stop working on folder 1 and not even bother doing the other commands (rsync and rm) and start on working on folder 2?

No, No, NO! Calling exit terminates your script; not just the current iteration of the loop containing it.

Yes. Calling continue causes the script to skip any remaining steps in the current iteration of the loop and continue processing starting at the beginning of the next iteration of the loop that contains it.