I never imagined, I would face so many problems with a directory archived the wrong way In any case, I was able to convert a complete directory archive into a directory of archives. In any case, here's a solution for those interested:
Problem:
A directory was gzipped on the whole. So, it becomes almost impossible to extract data from some particular files efficiently.
Constraint:
The archive is 13G and expands into 250G but the disk capacity is 50G
Conventional Answer:
13G Directory Archive --> Expands to 250G --> Converted into 13G Directory of Archives
Answer:
Convert the directory archive into a directory of archives.
Solution:
Step 1:
Prepare a shell script and place it in the directory where the archive is to be extracted: checkAndGzip.sh
#!/bin/bash
for FILE in `find ./ -name "*.extension"`
do
temp=`lsof $FILE | awk '{if(NR>1) if($4 ~ "w") print $4}'`;
if [ "$temp" = "" ]; then
#Implies that the file is not in use
#Initiate gzip on file
gzip $FILE;
fi
done;
Note: Observe the usage of lsof which is a nice utility that tells if the file is in use.
Step 2:
Setup a cron as
*/2 * * * * /path/to/checkAndGzip.sh > output
Note: Cron runs every two minutes
Step 3:
Run this command in the directory:
gunzip -c archive.tar.gz | tar -xvf -
Logic:The logic is pretty simple. On the one hand, the extraction takes place and on the other, the cron executes a shell script that checks if a new file has been generated and then gzips it. The reason why we use lsof is to verify if the file is still being extracted (gzip doesn't seem to care about partial files) and if a file is in use, skip it during this run.
If anyone has a better solution, or have an improvement for the above solution, kindly suggest