sorting using substring

tvsubhaskar · March 24, 2011, 1:08am

Dear all,

I want to list all duplicate files that are present in all subdirectories. I used the following the command and it worked fine.

find . -type f -print | sort -i

This is giving a sample output as follows:

./out
./cas/catch.dat
./cas/File1.dat
./baab/bumber.dat
./baab/File1.dat
./uday/hahah.dat
./uday/samp/CAS/test.txt
./uday/samp/File1.dat
./uday/bhas/File1.dat
./uday/File1.dat
./aaaa/sample.dat
./aaaa/File1.dat
./aaaa/a12/File1.dat

============================
How ever i want to sort this out such that all the duplicate file are listed in a single line. I mean i want to sort based on substring in each line

I want to sort based on substring starting from last occurrence of "/" to the end of the line. is there a staright forward way to do this

thanks in advance
uday

Ygor · March 24, 2011, 2:05am

Try...

find . -type f -print | awk -F/ '{print $NF, $0}' file1|sort -i

cgkmal · March 24, 2011, 2:20am

Hi tvsubhaskar,

If you only want to show in a single line all files from the sample output:

awk '{sub(".*/",""); sub("^","/");printf "%s ", $0}' inputfile
/out /catch.dat /File1.dat /bumber.dat /File1.dat /hahah.dat /test.txt /File1.dat /File1.dat /File1.dat /sample.dat /File1.dat /File1.dat

If you only want to show in a single only unique files

awk -F"/" '$NF~/\./{R[$NF]=$NF}END{ for(i in R) if(R>1) printf "%s ", "/"i}' inputfile
/hahah.dat /test.txt /bumber.dat /sample.dat /File1.dat /catch.dat

Or only unique files, showing times they appear within ()

awk -F"/" '$NF~/\./{R[$NF]++}END{ for(i in R) printf "%s ", "/"i"("R")"}' inputfile
/hahah.dat(1) /test.txt(1) /bumber.dat(1) /sample.dat(1) /File1.dat(7) /catch.dat(1)

Or only show duplicates files(appear more than once) with how many times appear within ()

awk -F"/" '$NF~/\./{R[$NF]++}END{ for(i in R) if(R>1) printf "%s ", "/"i"("R")"}' inputfile
/File1.dat(7)

Hope it helps,

Regards

pravin27 · March 24, 2011, 2:47am

while read file; do echo ${file%/*} ${file##.*/} ; done < inputfile | sort -k2 | awk '{printf a==$2?$1"/"$2" ":"\n"$1"/"$2;}{a=$2}'

un1xl0ver_rwx · March 24, 2011, 3:21am

I guess you want to show duplicate files listed in the same line.

$ awk -F [/] '{++a[$NF];c[$NF]=c[$NF] " " $0} END{for(i in c){ if (a>1){print c,a}}}' input
 ./aaaa/sample.dat ./sample.dat 2
 ./baab/bumber.dat ./test/tes/baab/bumber.dat ./baab/bumber.dat 3
 ./cas/File1.dat  ./baab/File1.dat ./uday/samp/File1.dat ./uday/bhas/File1.dat  ./uday/File1.dat ./aaaa/File1.dat ./aaaa/a12/File1.dat 7

Hope this helps.

summer_cherry · April 7, 2011, 5:51am

find . -type f | sort -t"/" +2