I used gzip command to compress a huge tar file. But I saw that compression % was more than 100%.
It might have inflated instead , probably because tar file is already packed properly.
So I thought of unzippping it. Now after unzip I expected the tar file to be of less size than .tar.gz file. But ,to my surprise it was more than that. Does that mean gzip actually reduced size eventhough , it show % compression more than 100%?
Here are statistics:
after gzip
a.tar.gz - 20,915,558,979
after gunzip
a.tar - 22,213,027,840
Compression % = 175.7
(Sorry, I forgot to check size of original tar , means tar before I zipped)
There is of course some overhead but very little since, as you noted, it's smart enough to switch algorithms when faced with a file that compresses badly. It's much better than some older compressors which, worst case, could double the size of a file.
I have no idea where the 175% comes from, it makes no apparent sense either way you consider it.
Compression is always performed, even if the compressed file
is slightly larger than the original. The worst case expan-
sion is a few bytes for the gzip file header, plus 5 bytes
every 32K block, or an expansion ratio of 0.015% for large
files. Note that the actual number of used disk blocks
almost never increases. gzip preserves the mode, ownership
and timestamps of files when compressing or decompressing.
What you show does not indicate 175% of anything, the documentation says 'No way'
$ jim> bc -l
20915558979 / 22213027840
.94158973417106202123
So, I think:
Your problem is that whatever you used to do the % calculation overflowed integers and gave garbage results. Those are 20GB files.
In fact, you appear to have had about 6% compression.
That 6% was probably because there were a lot of executable files/binary data files in the tar file. Those do not compress as well as text. "Normal" compression is on the order of 70%.
You may have a 32bit implementation of gzip -= the file command will show something like below - note the text in red. I would guess 32bit in your case.
Not really... It's not supposed to know or care how long the file is, it just reads and writes and churns until the OS says 'ok, all done'. If a stream compressor can't handle input of arbitrary length, that's a bug.
So the code that broke down here technically had nothing to do with the compression. Fortunately.