Not sure if this is really in the right forum but here goes....
Looking for a way to extract individual compressed files from a compressed tarball WITHOUT tar -zxvf and then recompressing. Basically we need to be able to chunk out an individual compressed file while it still remains compressed.
For whatever reason, when we tar out the single, then recompress it's destroying the file integrity. The files are so large that we can't just decompress the whole then pick out pieces, so that's out.
We had looked at going in and extracting the bits and writing them elsewhere, but from what I understand about how tar and gzip work, we'd get garbage because the gzip algorithm is predicated on the overall file rather than the pieces when working with a tarball. To gunzip the individual pieces would rely upon the initial Huffman double encoded "rosetta stone" that was generated from the overall tarball structure, right? Without that "rosetta stone", we'd not be able to accurately decompressed the individual .gzs....and a reverse algorithm wouldn't work because the initial was encoded off of patterns that were present in the whole but may not be present in the individual.
I'm a bit of a n00b, so I just need to check and make sure I've absorbed this all correctly. But, in case I've processed it all incorrectly and if there is a way or a script that can accomplish this, please point me in the right direction. Thanks.
Unfortunately, I don't get to determine the format. It's not a compressed tarball I made. It is ready-made and I have to make lemonade with it. Otherwise, I'd probably set something else up if it were up to me.
Ugh. Yeah, I was afraid that that was the answer. I guess there's not much I can do on my end at this point but use the process of elimination to determine if it's my or the others' work where the corruption is creeping in.
Instead of using the compressed tar file, uncompress and untar the entire file, then compress the individual files, then tar the individual compressed files. That would allow you to extract a file, then uncompress only that file. It will also probably lower the risk of losing everything past a damaged place in the large compressed file. In fact, keeping a directory of the compressed individual files would allow "random access" because they would be available by filename.
The compression savings would probably differ from the original. Experimentation with a subset should allow you to estimate the difference.
What Operating System and version are you using?
How big is the largest archive (before and after compression)?
How big is the largest file (before and after compression)?
As others suggest, compressing an archive is foolish because you cannot extract individual files without decompressing the entire archive.
Is fitting lots more disc an option? In general there is no reason nowadays to compress files (because disc space is cheap) unless you need to copy them across a network.