Find duplicate files but with different extensions

Hi !

I wonder if anyone can help on this : I have a directory:

/xyz

that has the following files:

chsLog.107.20130603.gz
chsLog.115.20130603
chsLog.111.20130603.gz
chsLog.107.20130603
chsLog.115.20130603.gz

As you ca see there are two files that are the same but only with a minor difference which is

gz

extension.
How can I find those files? I have tryied the

find

command, but is not very helpfull

find . -name *.gz

That will only find

gz

extension filenames

Whats your expected output?

I am expecting to see:

chsLog.107.20130603.gz
chsLog.107.20130603

Try this:

find|awk '/gz$/{sub(/.gz$/,"")gz[$1]++;next}{a[$1]++}END{for(i in a)if(gz)print i}'

error message:

 find|awk '/gz$/{sub(/.gz$/,"")gz[$1]++;next}{a[$1]++}END{for(i in a)if(gz)print i}'
find: insufficient number of arguments
find: [-H | -L] path-list predicate-list
awk: syntax error near line 1
awk: illegal statement near line 1

Change find in the way you want it to work. Change it to find . to find all files in the current directory

---------- Post updated at 03:23 AM ---------- Previous update was at 03:20 AM ----------

This will work as well:

comm -12 <(ls chsLog*[0-9]) <(ls chsLog*gz|sed 's/.gz$//')

Hi

Does the syntax on your

comm

comand fine, because,

ls chsLog*gz

does not look right, is it?

I don't know what you mean. Did you try the command? For me it works:

$ comm -12 <(ls chsLog*[0-9]) <(ls chsLog*gz|sed 's/.gz$//')
chsLog.107.20130603
chsLog.115.20130603
1 Like

Did not work at the moment, because at this point in time, there is no

gz

extension filenames, but thanks anyway