Efficient Detection of Large Scale Redundancy in Enterprise File Systems

HPL-2008-30 (R.2) Efficient Detection of Large Scale Redundancy in Enterprise File Systems - Forman, George; Eshghi, Kave; Suermondt, Jaap
Keyword(s): data mining, min-hashing, set sketches, directory similarity and deduplication, file systems, scalability, storage management.
Abstract: In order to catch and reduce waste in the exponential demand for disk storage, we have developed a technology based on set sketches that enables enterprise storage managers to efficiently detect approximate duplication of large directory hierarchies, e.g. unnecessary mirroring by uncoordinated emplo ...
Full Report

More...