Remove duplicate files in same directory

Hi all.

Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on the backup server with different names while on the Mailserver only xyz:2,S exists.

e.g
on mail server:
xyz:2,RS

on backup server:
xyz:2,
xyz:2,S
xyz:2,RS

So in 1 directory i can have 3 copies of file xyz & 2 copies of file abc.
Can anyone help how i can remove the oldest files(xyz:2,) and keep only the most recent one on the backup server ?

Thanks:(

I am very new in unix but will it help if you use..
ls -rt| tail -1 ==> this will get you the most current file

rsync with -e --delete?

     --delete                delete extraneous files from dest dirs

If a mail file is deleted from the mail server , using --delete will delete on backup server as well.
Am running rsync regularly & files on backup server are deleted by a script only if they have a certain age e.g 10days

:frowning:

---------- Post updated at 06:39 PM ---------- Previous update was at 05:58 PM ----------

I tried on the following files:

Output is:

Still ... :frowning:

Anyone plz help me.thx.

coolatt, if I'm reading this right, you want to delete from the backup server, for each xyz,2* group, all but the most recent file pertaining to each group.

I recreated your directory with the following files. Please note the order in which they were created (ls -rt):

1265199975.P6583Q0M174865.ecs,S=623:2,
1265199975.P6583Q0M174865.ecs,S=623:2,F
1265198625.P6233Q0M875762.ecs,S=639:2,S
1265199975.P6583Q0M174865.ecs,S=623:2,S
1265198625.P6233Q0M875762.ecs,S=639:2,FS
1265198625.P6233Q0M875762.ecs,S=639:2,F

I created a script looking like this ($FILEDIR being the directory where you have the files that are to be checked, on the backup server):

#!/bin/bash

ls  -rt $FILEDIR > filelist.txt

awk  -F ':2,' '{print $1}' filelist.txt | sort -u > searchbase.txt

cat searchbase.txt | while read line
do
        grep "$line" filelist.txt | head --lines=-1
done

Please let me know if this outputs the correct files (it does for me).

If it does, I recon simply adding a |xargs rm after the head command should delete the older files.

BTW, the script also works if you have just one file for a group (say the original xyz:2, file), in the sense that it will not delete the backup.

Thanks cmf1985, for the script.

I run your script on a directory containing the following files:

I got the following output:

However,As you can see for the folowing group it didn't work:

1265202446.P7209Q1M203877.ecs,S=623:2,S
1265202446.P7209Q2M203877.ecs,S=623:2,S
1265202446.P7209Q3M203877.ecs,S=623:2,S

---------- Post updated at 02:22 PM ---------- Previous update was at 12:23 PM ----------

It is not working as expected :frowning:
I found another problem when i run the script (red+green):

On mail server i have the following email files (output of #ls -ltc):

On backup server i have the following email files (output of #ls -ltc):

After running the script on the backup server:

The script must keep the 3 files (which are both on the mail server & the backup server)
and delete the rest from the backup server.

But I think the prob is associated with timestamps of the files.

Please advise.Thanks.

I'm beginning to get a bit confused about what you want... Do you want to have the same files on the mail server and the backup server (as in, get a list of all the files currently on the mail server and delete from the backup server anything that's not in that list)?

Yes.

for e,g if I have xyz & abc on mailserver and xyzRS , xyzR , xyz , abcRSF , abcR, abc
on the backup server. then
xyzRS , xyzR , abcRSF, abcR must be deleted from the backup server.

Eventually on mailserver : xyz & abc
and on backup server : xyz & abc