Remove Duplicate Files On Remote Servers

Hello,
I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes.

#!/bin/bash

for file in `ls /export/home/podcast2/"$1" ` ; do

    if [ "`ssh server1.stm ls /export/home/podcast/data2/$1/$file`" = "/export/home/podcast/data2/$1/$file" ]
        then

         rm -f /export/home/podcast2/$1/$file


    fi
done

I would execute the script as.

Prompt>./shellscript.sh arg1

Thanks,

Jaysunn

You are connecting to the server for each file, this is slooooow; you should connect only one time. There are lots of ways of doing it, that's one:

ssh server ls remotedir | ( cd localdir && xargs -d"\n" rm )

Anyway, this whole idea is a bit weird. If this involves syncing files, rsync is the right tool.

Whoa,
This is exactly what I am looking for. I will have test it, but thanks for the reply.

Jaysunn

---------- Post updated at 09:38 AM ---------- Previous update was at 09:31 AM ----------

I have modified the script to work with my variables. However I am getting a xargs error.

I am on RHEL4 with bash. I checked the man page and did not see the -d option.

#!/bin/bash
server=podcast01.stm


ssh $server ls /export/home/podcast/data2/"$1" | ( cd /export/home/podcast2/"$1" && xargs -d"\n" echo rm -f)
[root@podcast2 bin]# ./remove_dups.sh kmox
xargs: invalid option -- d
Usage: xargs [-0prtx] [-E eof-str] [-e[eof-str]] [-I replace-str]
       [-i[replace-str]] [-L max-lines] [-l[max-lines]] [-n max-args]
       [-s max-chars] [-P max-procs] [--null] [--eof[=eof-str]]
       [--replace[=replace-str]] [--max-lines[=max-lines]] [--interactive]
       [--max-chars=max-chars] [--verbose] [--exit] [--max-procs=max-procs]
       [--max-args=max-args] [--no-run-if-empty] [--version] [--help]
       [command [initial-arguments]]

Thanks

The -d option is used to set the delimiter between fields. If your xargs does not have this option, you can omit it; it will work fine except for filenames with spaces

That's understating the matter. It will not work properly for filenames with spaces, tabs, newlines, single quotes, and double quotes.

You can improve the robustness of the pipeline by passing the output of ssh through

tr '\n' '\0'

and using xargs' -0 option. This will render it impervious to any characters except embedded newlines in filenames (which I assume is very unlikely to occur unless someone has been drinking and admining). If you retool to use `find -print0`, then there'd be no need for the tr filtering and even embedded newlines would be handled properly.

Also, the rm command in the original post needs some quoting to prevent field splitting damage.

Regards,
Alister

Absolutely.

Actually, I have this tr command aliased (lineto0) but I didn't remember it.