Best compression for log files?

I have been doing some investigation into a log file from one of my systems, and the means which I currently use to compress and rotate it. I am looking for something smarter than gzip, faster than bzip2, and that can match or beat "my script" (which is slow as heck, but WAY better compression ratios result)

If 23-bytes per line are timestamp, that is roughly 33% of the file. If another 45% can be saved by doing a dictionary-map/replace of the 25 most-common phrases (I wrote a python script to do my mapping)... then there must be a compression program out there that can compress my logs without me needing to do this stuff prior to a gzip... right? And the bonus is that I wouldnt have to also un-do my changes on the decompress

"My Script" does the following:

  1. Find the first date/timestamped line, convert the timestamp to a number (IE: 2008-03-06 11:24:36.123 becomes 20080306112436123)
  2. Each subsequent datestamp is replaced with the difference between it and the last stamped line (resulting in small numbers) converted into a "base 72" number-string
  3. looking at everything on the line BEYOND the stamp, I check to see if a message is repeating from the line above, if it IS, then I replace the entire message with a hyphen (so only the first occurrence is actually seen)
  4. finally I compress using "gzip --best" because it is 1000x faster than bzip2 (although bzip2 gives me a better ratio)

Any Ideas???

Compressing means keeping all the data as it was.

If you decrease file size a lot simply by removing stuff or using a predetermined methods for replacing redundancy, you are kind creating huffman encoding on your own. Without a table, so it can't be reversed unless a human knows the drill.

Why don't you just write these files off to tape and delete them off disk? That would result in an ultimate space savings. It will always take human intervention to expand and then interpret your hashed files anyway. So why not add in a little bit more time on the restore side and save time and lots of disk on the compression side. Or get really good archiving software --- :smile:

You might try re-writing the logs and convert date to a binary value on your own. Then gzip it.
However gzip is already doing quite nice thing with compression...
Every time when there is some possibility to optimize something - before doing anything think twice if it is worth of it.