Script to compare files in 2 folders and delete the large file

Hello, my first thread here.

I've been searching and fiddling around for about a week and I cannot find a solution.:confused:

I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files manually and it takes up quite a bit of time.

I was wondering if there is a script that can check the 2 folders and delete the larger of the 2 files and keep the smaller one.

I have the original videos in one directory and the converted in another directory. The filenames are always the same but sometimes the extensions are different.

eg. the destination output file will always have the .mkv extension but the original may have .avi, .mpg, mp4 etc. But the filenames themselves will always be the same.

Welcome to the forum!

What you request is certainly possible and may have been posted, al least in part, in these forums. Did you try a search with your keywords? When "fiddling around", what were your attempts, with what tools? Where did they fail, or did you get stuck?

Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.

I searched many forums before deciding to join here. I usually find a solution but this one has proven to be hard to find.

If the answer has already been posted I apologize for creating another post about it.

I am running Ubuntu 17.04 Server on my encoding machine which sits in my basement and I access it through SSH. I am thinking of utilizing diff along with a bash script to determine whether the original or re-encoded file is smaller and then have it delete the larger of the 2 files.

I got as far as playing around with diff a little bit but I am not a script writer so I have no idea how to implement what I want to do into an efficient script.

I will try searching the forum again.

IF I understand:

You have one filename with several different extensions (or in windows, file types):
example filename.aa filename.qb filename.abcd and maybe more.

If this is correct you need to aggregate all of the complete filenames by just the part before the dot in the filename.

What you need for input is

 the filename with no directory name and without a type
 size of the file in bytes
 the full filename  (directory/filename.filetype)

Output has to be the full filename and maybe the size, but only for the largest file in bytes.

You then LOOK at the output to make sure you did not screw up somehow, right?

Then finally you feed the full filenames in the output file to the rm command.

So:

# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
#  you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname   # fname is the complete file name
do
      shortfile=$(basename $fname)
      shortfile=${shortfile%%.*}
      size=(stat -c '%s' $fname)
      
      print " $shortfile $size $fname"
done < /tmp/list > /tmp/next

# /tmp/next has the data, so let's sort and aggregate it -  assuming no spaces in the shortfile name
# sort by shortfile

sort -k1 -k2n -o /tmp/next /tmp/next

# aggregate
# awk fields are $1 - shortfile, $2 - size,  $3 - fullname
awk '{ 
         arr($1)=$3 " " $2  # note that the last values to be stored for shortfile
                                   # come from  the last time shortfile is in the file
                                   
        }
         END { for (i in arr) {print arr(i)} }
        ' /tmp/next > /tmp/final
        
# delete ONLY after you check /tmp/final
while read fname
do 
     rm $fname
done < /tmp/final

This code is meant more to learn from than production. Others will show you how to make it more efficient. You need to understand this one first.

1 Like

Hello, Thank you for your help. I have ran into a snag though. Here is what I get when I attempt to create /tmp/next.

When executed without sudo:

josh52180@MediaBox:~$ ./next.sh
Error: no such file "home stat /home/josh52180/originals/home.video.01.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.02.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.04.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.05.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.06.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.07.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.08.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.10.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.11.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.12.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.13.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.14.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.16.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.17.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.18.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.19.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.01.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.02.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.04.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.05.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.06.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.07.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.08.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.10.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.11.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.12.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.13.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.14.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.16.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.17.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.18.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.19.mkv"

When executed with sudo:

josh52180@MediaBox:~$ sudo ./next.sh
./next.sh: 5: ./next.sh: Syntax error: "(" unexpected (expecting "done")

Three typos made this a mess. Check for zero length files added. You can remove it.
Apologies. Thanks Rudi for spotting the problem.

# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
#  you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname   # fname is the complete file name
do
      shortfile=$(basename $fname)
      shortfile=${shortfile%%.*}
      size=$(stat -c '%s' $fname)
      [ $size -eq 0 ] && continue # skip zero-length files 
      echo "$shortfile $size $fname"
done < /tmp/list > /tmp/next

# /tmp/next has the data, so let's sort and aggregate it -  assuming no spaces in the shortfile name
# sort by shortfile

sort -k1 -k2n -o /tmp/next /tmp/next

# aggregate
# awk fields are $1 - shortfile, $2 - size,  $3 - fullname
awk '{ 
         arr[$1]=$3 " " $2  # note that the last values to be stored for shortfile
                                   # come from  the last time shortfile is in the file
                                   
        }
         END { for (i in arr) {print arr} }
        ' /tmp/next > /tmp/final
        
# removed the rm stuff for now