Making a script to copy files not seen before (using md5sum)

I am glad to hear that.
Please mark this thread as solved by going to the "Thread tools" menu and clicking on "Mark this thread as solved", for the future reference of other users.
Also, I would suggest you to take a look at this great book, The Linux command line, which contains lots of useful information on Linux commands and scripting. The last edition was published last month :). Good luck!

Bummer. My testing routine was not specific enough. And shortly after I posted my "it's all fixed" message, I realized it was probably not.

The "photos-backup" folder is supposed to house the "DCIM" folder from the phone, which in turn has subfolders (i.e. 100MEDIA, 101MEDIA, etc.).

So I put the "set-x" back in the script:

#!/bin/bash
set -x

# The username variable passed by command line to this script
USER=$1

# The source directory where the photo folder on the phone is mirrored to
SRC=/hd1/home/$USER/.phonesync/photos-backup

# The destination directory where we want to copy only new photos we have copied before
DST=/hd1/home/$USER/.phonesync/photos-new

# The MD5 list file that tracks which files we have copied before
MD5=/hd1/home/$USER/.phonesync/photos-backup.md5

# Check files against the MD5 list and then copy if not previously copied
# Then add the md5 for that file to the MD5 list
cd $SRC
for f in *
do
  FMD5=$(md5sum $f)
  grep -q $FMD5 $MD5
  if [[ $? -ne 0 ]]; then
    cp $f $DST
    md5sum $f >> $MD5
  fi
done

# In case this script gets run as root, redo the file ownership so users can access their photos
chown -R $USER:$USER $DST

And ran it... and here is the output.

nick@server ~/.phonesync$ ./photos-new.sh nick
+ USER=nick
+ SRC=/hd1/home/nick/.phonesync/photos-backup
+ DST=/hd1/home/nick/.phonesync/photos-new
+ MD5=/hd1/home/nick/.phonesync/photos-backup.md5
+ cd /hd1/home/nick/.phonesync/photos-backup
+ for f in '*'
++ md5sum DCIM
md5sum: DCIM: Is a directory
+ FMD5=
+ grep -q /hd1/home/nick/.phonesync/photos-backup.md5
^C

Had to CTRL+C to stop the script.

The folder structure of photos-backup is...

nick@server ~/.phonesync$ ls * -a -R -l
-rw-r--r-- 1 nick nick   46 Sep  4 10:42 photos-backup.md5
-rwxr-xr-x 1 nick nick  841 Sep  4 11:11 photos-new.sh

photos-backup:
total 12
drwxr-xr-x 3 nick nick 4096 Sep  4 11:15 .
drwxr-xr-x 4 nick nick 4096 Sep  4 10:42 ..
drwxr-xr-x 3 nick nick 4096 Sep  4 11:09 DCIM

photos-backup/DCIM:
total 12
drwxr-xr-x 3 nick nick 4096 Sep  4 11:09 .
drwxr-xr-x 3 nick nick 4096 Sep  4 11:15 ..
drwxr-xr-x 2 nick nick 4096 Sep  4 11:10 100MEDIA

photos-backup/DCIM/100MEDIA:
total 4080
drwxr-xr-x 2 nick nick    4096 Sep  4 11:10 .
drwxr-xr-x 3 nick nick    4096 Sep  4 11:09 ..
-rw-r--r-- 1 nick nick 1243984 Jun 28  2012 IMAG0001.jpg
-rw-r--r-- 1 nick nick 1551828 Jun 28  2012 IMAG0002.jpg
-rw-r--r-- 1 nick nick 1369884 Jun 28  2012 IMAG0003.jpg

photos-new:
total 8
drwxr-xr-x 2 nick nick 4096 Sep  4 11:15 .
drwxr-xr-x 4 nick nick 4096 Sep  4 10:42 ..

So how do I modify this script for it to carry out the copies for all subfolders of "photos-backup"?

I see. Basically, what it's saying is that you can't md5sum a directory.
Here's what I would do. Once you change directory to $SRC ( cd $SRC ):

  • Check if each file is a regular file or a directory.
  • If it's a regular file, continue with the current for loop.
  • If it's a directory, change to this directory and then you'll have to write another for loop to perform the same test with each file inside. Once done, go to its parent directory and resume the main loop.

Writing a for loop with a couple of conditionals is not that difficult. If you need help to know how to determine if a certain file (the way Unix or Linux sees it) is a regular file, let us know.

Do you want to copy $SRC/DCIM/100MEDIA/IMAG000[1-3].jpg to $DST/DCIM/100MEDIA/IMAG000[1-3].jpg or to $DST/IMAG000[1-3].jpg ?

Do all of the files you want to copy have names that end with .jpg ?

Hi Don,

The phone stores both photos and videos in the DCIM structure, so I'd prefer not to limit to a single extension. If I have to call out extensions, I would want the flexibility to supply more than one to look for.

And I think I would prefer to copy $SRC/DCIM/100MEDIA/IMAG000[1-3].jpg to $DST/DCIM/100MEDIA/IMAG000[1-3].jpg rather than dump all files in one $DST location - just in case there are filename duplicates within a "100MEDIA" directory vs. a "101MEDIA" directory that might be under the DCIM directory.

You could try changing:

cd $SRC
for f in *
do
  FMD5=$(md5sum $f)
  grep -q $FMD5 $MD5
  if [[ $? -ne 0 ]]; then
    cp $f $DST
    md5sum $f >> $MD5
  fi
done

in your script to something like:

cd $SRC
find . -type f | while IFS= read -r f
do
  FMD5=$(md5sum $f)
  grep -q $FMD5 $MD5
  if [[ $? -ne 0 ]]; then
    if [ ! -d $DST/${f%/*} ]; then
      mkdir -p $DST/${f%/*}
    fi
    cp $f $DST/$f
    md5sum $f >> $MD5
  fi
done
1 Like

OK... So I modified the code of "photos-new.sh" to be as follows:

#!/bin/bash
set -x

# The username variable passed by command line to this script
USER=$1

# The source directory where the photo folder on the phone is mirrored to
SRC=/hd1/home/$USER/.phonesync/photos-backup

# The destination directory where we want to copy only new photos we have copied before
DST=/hd1/home/$USER/.phonesync/photos-new

# The MD5 list file that tracks which files we have copied before
MD5=/hd1/home/$USER/.phonesync/photos-backup.md5

# Check files against the MD5 list and then copy if not previously copied
# Then add the md5 for that file to the MD5 list

cd $SRC
find . -type f | while IFS= read -r f
do
  FMD5=$(md5sum $f)
  grep -q $FMD5 $MD5
  if [[ $? -ne 0 ]]; then
    if [ ! -d $DST/${f%/*} ]; then
      mkdir -p $DST/${f%/*}
    fi
    cp $f $DST/$f
    md5sum $f >> $MD5
  fi
done

# In case this script gets run as root, redo the file ownership so users can access their photos
chown -R $USER:$USER $DST

I ran the script and got the following output:

nick@server ~/.phonesync$ ./photos-new.sh nick
+ USER=nick
+ SRC=/hd1/home/nick/.phonesync/photos-backup
+ DST=/hd1/home/nick/.phonesync/photos-new
+ MD5=/hd1/home/nick/.phonesync/photos-backup.md5
+ cd /hd1/home/nick/.phonesync/photos-backup
+ find . -type f
+ IFS=
+ read -r f
++ md5sum ./DCIM/100MEDIA/IMAG0003.jpg
+ FMD5='c0bd05642752af82a79fef52fffb3120  ./DCIM/100MEDIA/IMAG0003.jpg'
+ grep -q c0bd05642752af82a79fef52fffb3120 ./DCIM/100MEDIA/IMAG0003.jpg /hd1/home/nick/.phonesync/photos-backup.md5
+ [[ 1 -ne 0 ]]
+ '[' '!' -d /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA ']'
+ mkdir -p /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA
+ cp ./DCIM/100MEDIA/IMAG0003.jpg /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA/IMAG0003.jpg
+ md5sum ./DCIM/100MEDIA/IMAG0003.jpg
+ IFS=
+ read -r f
++ md5sum ./DCIM/100MEDIA/IMAG0001.jpg
+ FMD5='ccf0730cdc59d92323465401905b9a79  ./DCIM/100MEDIA/IMAG0001.jpg'
+ grep -q ccf0730cdc59d92323465401905b9a79 ./DCIM/100MEDIA/IMAG0001.jpg /hd1/home/nick/.phonesync/photos-backup.md5
+ [[ 1 -ne 0 ]]
+ '[' '!' -d /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA ']'
+ cp ./DCIM/100MEDIA/IMAG0001.jpg /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA/IMAG0001.jpg
+ md5sum ./DCIM/100MEDIA/IMAG0001.jpg
+ IFS=
+ read -r f
++ md5sum ./DCIM/100MEDIA/IMAG0002.jpg
+ FMD5='9a0d8d0d82690ecf7c690fe386679ae3  ./DCIM/100MEDIA/IMAG0002.jpg'
+ grep -q 9a0d8d0d82690ecf7c690fe386679ae3 ./DCIM/100MEDIA/IMAG0002.jpg /hd1/home/nick/.phonesync/photos-backup.md5
+ [[ 1 -ne 0 ]]
+ '[' '!' -d /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA ']'
+ cp ./DCIM/100MEDIA/IMAG0002.jpg /hd1/home/nick/.phonesync/photos-new/./DCIM/100MEDIA/IMAG0002.jpg
+ md5sum ./DCIM/100MEDIA/IMAG0002.jpg
+ IFS=
+ read -r f
+ chown -R nick:nick /hd1/home/nick/.phonesync/photos-new

And my file structure shows the files were copied:

nick@server ~/.phonesync$ ls -a -l -R
.:
total 24
drwxr-xr-x 4 nick nick 4096 Sep  4 10:42 .
drwxr-xr-x 7 nick nick 4096 Sep  4 10:37 ..
drwxr-xr-x 3 nick nick 4096 Sep  4 11:15 photos-backup
-rw-r--r-- 1 nick nick  235 Sep 10 14:50 photos-backup.md5
drwxr-xr-x 3 nick nick 4096 Sep 10 14:50 photos-new
-rwxr-xr-x 1 nick nick  942 Sep 10 14:30 photos-new.sh

./photos-backup:
total 12
drwxr-xr-x 3 nick nick 4096 Sep  4 11:15 .
drwxr-xr-x 4 nick nick 4096 Sep  4 10:42 ..
drwxr-xr-x 3 nick nick 4096 Sep  4 11:09 DCIM

./photos-backup/DCIM:
total 12
drwxr-xr-x 3 nick nick 4096 Sep  4 11:09 .
drwxr-xr-x 3 nick nick 4096 Sep  4 11:15 ..
drwxr-xr-x 2 nick nick 4096 Sep  4 11:10 100MEDIA

./photos-backup/DCIM/100MEDIA:
total 4080
drwxr-xr-x 2 nick nick    4096 Sep  4 11:10 .
drwxr-xr-x 3 nick nick    4096 Sep  4 11:09 ..
-rw-r--r-- 1 nick nick 1243984 Jun 28  2012 IMAG0001.jpg
-rw-r--r-- 1 nick nick 1551828 Jun 28  2012 IMAG0002.jpg
-rw-r--r-- 1 nick nick 1369884 Jun 28  2012 IMAG0003.jpg

./photos-new:
total 12
drwxr-xr-x 3 nick nick 4096 Sep 10 14:50 .
drwxr-xr-x 4 nick nick 4096 Sep  4 10:42 ..
drwxr-xr-x 3 nick nick 4096 Sep 10 14:50 DCIM

./photos-new/DCIM:
total 12
drwxr-xr-x 3 nick nick 4096 Sep 10 14:50 .
drwxr-xr-x 3 nick nick 4096 Sep 10 14:50 ..
drwxr-xr-x 2 nick nick 4096 Sep 10 14:50 100MEDIA

./photos-new/DCIM/100MEDIA:
total 4080
drwxr-xr-x 2 nick nick    4096 Sep 10 14:50 .
drwxr-xr-x 3 nick nick    4096 Sep 10 14:50 ..
-rw-r--r-- 1 nick nick 1243984 Sep 10 14:50 IMAG0001.jpg
-rw-r--r-- 1 nick nick 1551828 Sep 10 14:50 IMAG0002.jpg
-rw-r--r-- 1 nick nick 1369884 Sep 10 14:50 IMAG0003.jpg
nick@server ~/.phonesync$

This is very encouraging. I'll do some more testing and report back.

I'm glad to hear that. Keep up the good work! And let us know how it goes :).

OK... As far as I can tell this script is finished. Here was my testing process.

1) Delete a file from $DST. Rerun script. Did the file copy again from $SRC?
NO = PASS

2) Delete the entire directory structure under $DST. Delete the MD5 list file. Rerun script. Did it recreate the $DST structure from $SRC?
YES = PASS / NOTE: Must retain "photos-new/" folder, but can delete entire structure underneath that.

3) Delete the MD5 list file. Do not delete any files. Rerun script. What happens? Examine debug output.
PASS / Seems to cp and overwrite the files in $DST. This is acceptable.

4) Delete a file from $SRC. Rerun script. It should not copy any files. Did it?
PASS

5) Delete the directory structure under $DST. Delete MD5 list file. Run script as root. Does it work and assign ownership correctly to all child objects?
YES / PASS

6) Copy a file in $SRC to $SRC without changing its contents. Run script. Did the file copy over to $DST?
NO = PASS

7) Modify the file copied in test 6. Run script. Did the file copy over to $DST?
YES = PASS

8) Delete the MD5 list file. Do not delete any files in $SRC or $DST. Modify a file in $DST. Rerun script. What happens? Examine debug output.
PASS / Files that were modified in $DST were overwritten with files from $SRC with same filename if the MD5 list is deleted.

Now, someone told me to mark the thread as solved by using the "Thread Tools" menu, but my Thread Tools menu does not have the option to mark the thread solved. Am I missing something?

  • Nick

I'm glad you were able to get your script working.

About 12 minutes after you posted the first message in this thread, you changed the thread's Title from Making a script to copy files not seen before (using md5sum) to Need to make a script to copy files not seen before (using md5sum) . To mark the thread solved, you just need to edit that message again adding [Solved] to the start of the title. (I've done that for your this time; but now you know how to do it next time.)

2 Likes