I have many pdf's scattered across 4 machines. There is 1 location where I have other Pdf's maintained. But the issues it the 4 machines may have duplicate pdf's among themselves, but I want just 1 copy of each so that they can be transfered to that 1 location.
What I have thought is:
1) I have designed a script that will scan each of the 4 machines, and print the list of pdf files in a text file named list.txt.
2)So now I have all the pdf's listed in the list.txt file.
3) I need a shellscript that will now check this list and sort duplicate files. So that I know where are they located and even have them grouped together.
The list.txt contains the path along with the file name. so I guess we have to check just the ending file name part before ".pdf".
Please help me do this.
The list.txt looks like below, which is already generated.
thanks for the explanation, needed that, also what modification is needed to display the non duplicate files as well but after all duplicate ones are displayed?
I now have a list of duplicate files, but the issue is I need to eliminate only the ones that are same not the ones that are different but still have the same name.
For eg
if the files are
david/project1/symbiosys.pdf
tom/project1/symbiosys.pdf
if both are workng on same project the pdf's may be similar, but I need to be sure, maybe by md5 checksum or something that can be found out,
but if the file size differs i need to save both of them, in 2 different folders to prevent them from overwriting.
Any suggestions or help in regards to shellscript needed.