Script to Compare file size and delete the smaller

JC_1 · August 3, 2012, 5:08pm

I am pretty new to scripting, so I appreciate your advice in advance.
The problem:

100 directories each containing 2 files that have the same extension with random names. The only attribute that discriminates the files is size. I would like to write a script that compares the files for size and deletes the smaller of the two. Thanks! - OSX, X11, bash

Corona688 · August 3, 2012, 5:38pm

Is every pair of extensions unique? If not, it's hard to be 100% sure about anything.

JC_1 · August 3, 2012, 5:51pm

The file names are random. The extension is the same (that is probably windows talk - sorry). I want the script to look in each directory and delete the smaller of the two files.
e.g.,

directory 1

asdfsdfo72392874983.nii.gz
asdfju07r-3828poafljkao.nii.gz

directory n

asdfopiw4-8rtlkjkkfa.nii.gz
ukjfsaoi04t0ifaf';lk'lk.nii.gz

Corona688 · August 3, 2012, 6:45pm

I think I misunderstood. They're in 100 different directories, not in one pile, ergo they can be differentiated. Okay.

find /path/to/base -type d | while read LINE
do
        set -- $(ls -S "$LINE"/*.gz)
        [ "$#" -eq 2 ] || continue # Ignore folders that don't have 2 files

        echo "Keeping $1"
        echo rm "$2"
done

JC_1 · August 3, 2012, 7:13pm

Thank you so much! I'm embarrassed to say I've been struggling with that for 2 days. (I have a hard time asking for help ). On the whole the script is beyond my vocabulary, but could you say how you actually made the size comparison here and how you got that output to "$1" and "$2".

alister · August 3, 2012, 7:28pm

It would probably be better to omit the quotes. The reason I say that is because no quotes makes it clear that the approach isn't intended to handle whitespace and pattern matching characters. As is, casual inspection may instill a false sense of security. Whatever those quotes protect against in the subshell will just bite in the parent shell.

Regards,
Alister

Corona688 · August 3, 2012, 10:06pm

ls -S sorts by file size. The largest one comes first.

As for how I get it into $1 $2, try this:

set -- a b
echo $1
echo $2