Help with file compare and move script

I'm running debian (the raspbian version of it) and working on a script to compare files in 2 directories, source and target, move files with duplicate names to a 3rd directory, then move remaining files in source to target. I can't get the syntax right, keep getting syntax errors and can't get past the file comparison stage to start figuring out the move portion. I thought I'd print the results to start, to see if it's working.

This isn't intended to be a command line script, it's intended to run automatically on boot so there shouldn't be any user intervention required. I found several scripts that require a user to input directories when they're run, then delete duplicates. I took what looked like the most easy to understand one and am trying to modify it. I was mistaken about the simplicity.

Any advice or hints would be greatly appreciated.

#!/bin/bash
# Compare file names in source and target directories
# Move duplicates from source to duplicates directory
# Move remaining files in source to target directory
# Only care about files names, not upper lower case, checksum, date, time

dir1="/mnt/nas/source"
dir2="/mnt/nas/target"
dir3="/mnt/nas/Duplicates"

for file in $dir1 ;
  do [ -f $dir2 ${file} ] && echo ${file} ;
done

-f ${dir2}/${file}
1 Like

Above won't work as the for loop won't supply file names but just the directory name. If complemented with /* , full path names would be supplied then. You'd have to either cd into the source dir, or strip off the path part:

cd "$dir1"
for FN in *; do ...; done

OR

for FN in "$dir1"/*; do echo mv "${dir2}${FN##*/}" "$dir3"; done
1 Like

Thanks for the advice above, it got me further along but I've run into another wall or two. Here's where I'm at so far

!/bin/bash
# Compare file names in source and target directories
# Move duplicates from source to duplicates directory
# Move remaining files in source to target directory
# Only care about files names, not upper lower case, checksum, date, time

cd /mnt/nas/source
dir1=$(ls *.*)
cd /mnt/nas/target
dir2=$(ls *.*)
cd / 
dir3="/mnt/nas/Duplicates"

for FN in $dir1;
do
    if [ -f "$dir2 -eq ${FN}" ];
        then echo "Duplicate "${FN};
        else echo "Unique "${FN};
    fi;
done

The first problem is I can't get the IF statement to work no matter where I put quotes, parens, brackets, or curly brackets. Also switching -eg for =, ==, or / makes no difference. I get either a 'too many parameters on line 16" error or it drops right through and declares every file unique when 10 out of 20 are duplicates.

Second problem is I have a file named 'space test dupe10.jpg' that I named to see how it would handle spaces in file names. I have a simplified version of the script (no IF statement) that just echoes the variables as it increments through them, and it appears to treat space, test, and dupe10.jpg as 3 different files.

This isn't a life or death situation so I greatly appreciate any and all advice. I'm just updating an electronic picture frame that hangs on the living room wall and runs for 4 hrs per night which makes adding pictures a pain since it has to be done while it's on. With this script I can put the pics on my NAS and they'll get copied over when the frame boots. I can do that now but duplicate file names are a concern. I made this thing before you could buy them, from instructions in a physical popular mechanics magazine. 15 or so years and several thousand pics later and you can imagine how many times my wife (a photography hobbyist no less) has tried to load "flowers.jpg" on it.

This demo program might help you to your solution:

dir1="holiday.jpg camping.jpg beach.jpg"
dir2="xmas.jpg holiday.jpg"

for FN in $dir1
do
    found=0
    for FO in $dir2
    do
        if [ $FN = $FO ]
        then
            found=1
        fi
    done

    if [ $found -eq 1 ]
    then
        echo "Duplicate $FN"
    else
        echo "Unique $FN"
    fi
done

Output:

Duplicate holiday.jpg
Unique camping.jpg
Unique beach.jpg
1 Like

First off: you are getting along fine. We all had about the same issues you are experiencing right now when we learned our trade, so don't worry - keep trying and you will sure be giving the answers here instead of asking them.

Let us get to the first problem:

It helps to understand problems like this by picturing how the shell works: a script is basically a list of commands the shell will enter on your behalf line by line. Therefore you can "debug" your code the same way: open a new terminal window and paste the relevant pieces in, line by line - then see what happens. Furthermore there is a device you might find very useful: set -xv and set +xv . The first one switches on, the second one off a feature that shows every command as it will be exeuted in a script. So, you could insert into your script:

[...]
    set -xv
    if [ -f "$dir2 -eq ${FN}" ];
        then echo "Duplicate "${FN};
        else echo "Unique "${FN};
    fi;
    set +xv

to see exactly what happens between these lines and what the respective values for the variables are at that point.

The next thing is your quoting: quoting in shell is necessary to prevent something called "field splitting" - the shell splits every command line into fields ("words") before executing it. This is why i.e. command -opt argument is interpreted as calling "command" with the options "-o" "-p" and "-t" and the argument "argument" and not the argument "-opt argument". Field splitting per default happens with spaces as delimiters. Quoting has in fact two reasons: first, to prevent this field splitting - this is what double quotes are mainly used for - and to revert characters with a special meaning to the shell back to normal characters. Try i.e.:

var="foo bar"
echo $var

Notice, btw., the double quotes: they prevent the field splitting, otherwise the shell would complain for syntax errors because var=foo would be a valid statement and bar would be nothing meaningful - hence, complaint. This is field splitting at work. Now try:

var="foo bar"
echo '$var'

And notice the difference in output. This is because the single quotes have stripped the special meaning of "$" from the character and therefore "$var" is no longer meaning "expand this to the content of variable var" but just a string consisting of the four characters "$", "v", "a" and "r". Btw.: it is a common misconception that quotes can be nested: "....'...'..." . This is not the case at all. Inside a quote everything is a normal character until this quote is closed. The string before is just a double-quoted string with two characters that happen to be single-quote characters with no special meaning.

Now, in light of all this, let us look again at your codeline:

if [ -f "$dir2 -eq ${FN}" ]; 

It is obvious that the double quotes are misplaced, they should be surrounding the variables, which would protect the script from breaking when these variables contain spaces:

if [ -f "$dir2" -eq "${FN}" ]; 

But there is another issue with this line and it has nothing to do with quoting. Let us first examine the if -statement and how it works: if gets a command as argument, executes it and if this returns 0 (this is the UNIX way of programs saying they were successful) the then -part is executed, otherwise the else -part if there is one. Here is an example:

ls /etc ; echo "The return code is: $?"

This lists the content of the /etc -directory and because it exists ls will return 0 (or "TRUE") which is confirmed in the following echo-statement. If the directory wouldn't exist then ls would return something else (anything else is "FALSE"):

ls /this/does/not/exist ; echo "The return code is: $?"

We could use this in the if -statement - we won't even need ls 's output, just its return code:

if ls /etc > /dev/null ; then
     echo "ls returned 0"
else
     echo "ls returned something else"
fi

Plug in some non-existing directory instead of /etc and you will see how it works.

Coming back to your line:

if [ -f "$dir2" -eq "${FN}" ] ; then

Which program is called here? It is - and you might not have guessed that - the program [ . Yes, ridiculous as it seems, this is really the name of it and in fact this is another name for the program test . You see, when shells were first created and the mechanism described above of plugging any command into if was invented they invented test to do what usually if-statements in other programming languages do. So you had lines like:

if test "$x" == "$y" ; then

This work (in fact upt to now), but programmers were used to another style, like in C:

if( x == y)

So someone "invented" a link from /bin/test to /bin/[ and now lines looked like:

if [ "$x" == "$y" ; then

This resembled what they were used to much more but opening a bracket that wasn't closed was among the things "good people won't do", so /bin/test was changed: if it was called as [ it would expect a ] as last argument! We had athe syntax we know today:

if [ "$x" == "$y" ] ; then

So in fact this is a call to test with the arguments "$x", "==" and "$y" - and the last canonical argument "]". If you ever want to know something about "[" and how it works - consult the man page of "test"!

Now let us, in light of this, examine what you fed poor test as arguments:

test -f "$dir2" -eq "${FN}"

-f expects a single argument after it and test will return TRUE if this second argument is a file and FALSE if not. Basically

if [ -f "$dir2" ] ; then

would say: "if dir2 exists and is a file then do...". -eq on the other hand expects two operands: one before and one after it. It is intended for NUMERICAL (only numerical!) comparisons and tests for equality, like the anme suggests. Try this:

if [ 1 -eq 1 ] ; then
    echo "these are equal"
else
    echo "these are not"
fi

Change "1" to any other number and watch the result again.

So the problem is: you cannot mix two different conditions and most likely this was not what you intended anyway. What you probably wanted to compare was filenames which are basically strings. You cannot compare strings with the -eq (or the -lt , -le , -gt or -ge ) operators because they only work on numbers - integers, to be precise. For strings there are the == and the != operators, which test for equality or non-equality.

You perhaps want to find out how to correctly phrase your condition yourself, so i won't spoil the fun. Happy programming.

I hope this helps.

bakunin

1 Like

Hi Chubler_XL,
Your example works fine for the example you've chosen, but, as mattz40 mentioned in post #4 in this thread, it won't work if one or more of the filenames in the lists in the expansions of $dir1 and $dir2 contain a <space> character.

Hi mattz40,
Maybe you would want to try something more like:

#!/bin/bash
# Compare file names in source and target directories
# Move duplicates from source to duplicates directory (not yet implemented)
# Move remaining files in source to target directory (not yet implemented)
# Only care about filenames; not checksum, date, time

dir1="/mnt/nas/source"
dir2="/mnt/nas/target"
dir3="/mnt/nas/Duplicates"

for Path1 in "$dir1"/*.*
do	File1="${Path1##*/}"
	found=0
	for Path2 in "$dir2"/*.*
	do	if [ "$File1" = "${Path2##*/}" ]
		then	found=1
			break
		fi
	done
	if [ $found -eq 1 ]
	then	echo "Duplicate: \"$File1\""
	else	echo "Unique: \"$File1\""
	fi
done

Note that the comment line in your sample code:

# Only care about files names, not upper lower case, checksum, date, time

has been modified because this code does care about case in filenames. You can modify the code if you want to allow case-insensitive matches, but that is not the way normal UNIX/Linux/BSD filesystems work.

If the directory /mnt/nas/source contains the files:

birds and bees.jpg
flowers.jpg
hive.jpg

and the directory /mnt/nas/target contains the files:

beehive.jpg
birds and bees.jpg
flowers.jpg

then the above code will produce the output:

Duplicate: "birds and bees.jpg"
Duplicate: "flowers.jpg"
Unique: "hive.jpg"

Note also that the dir1 and dir2 variables now contain the full pathnames of the source and target directories; not lists of words contained in filenames in those directories. The Path1 and Path2 variables contain absolute pathnames of a file in the source and target directories, respectively and the File1 variable contains the filename of the last component of the pathname in the expansion of $Path1 .

Note that bakunin gave an excellent explanation of what was wrong in your if [ ... ] expression. But I have to disagree with one point. The standard test expression and [ expression ] string equality operator is a single <equals-sign>, not the double <equals-sign> that is used in the C Language. Some shells will accept both, some shells will give you a syntax error is you use the double <equals-sign>, and some manual pages for some shells will say that the single <equals-sign> form is deprecated, but that is not what the shell standards say. I don't know of any shells that do not accept the single <equals-sign> form that is required by the standards.

(Some shells also have a [[ expression ]] in which [[ is a shell keyword; not the name of utility used to evaluate expressions. In the shells that understand [[ expression ]] , the string equality operator in expression in this form is the double <equals-sign> in all shells that I've used. And some shells accept a single <equal-sign> operator in this form with a similar, but not always identical, meaning.)

2 Likes

WOW! Bakunin, Chubler and Don, thank you so much for the tour de force in explainiology. Aside from how helpful that was you folks renewed my faith in humanity. There are really good people out there and you're not just in that number, you're leading the pack.

That was better than I imagine a boot camp would be.