Re: Faster compression, again

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Faster compression, again
Date: 2012-03-14 20:10:29
Message-ID: CAHyXU0xaouYYRLNSTkpAXPReO3imT3YDWd7L39TgN6dKfF8t8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 14, 2012 at 1:06 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> For 9.3 at a minimum.
>
> The topic of LZO became mired in doubts about:
>
> * Potential Patents
> * The author's intention for the implementation to be GPL
>
> Since then, Google released "Snappy," also an LZ77-class
> implementation, and it has been ported to C (recently, and with some
> quirks, like no LICENSE file...yet, although it is linked from the
> original Snappy project).  The original Snappy (C++) has a BSD license
> and a patent grant (which shields you from Google, at least).  Do we
> want to investigate a very-fast compression algorithm inclusion again
> in the 9.3 cycle?
>
> I've been using the similar implementation "LZO" for WAL archiving and
> it is a significant savings (not as much as pg_lesslog, but also less
> invasive).  It is also fast enough that even if one were not to uproot
> TOAST's compression that it would probably be very close to a complete
> win for protocol traffic, whereas SSL's standardized zlib can
> definitely be a drag in some cases.
>
> This idea resurfaces often, but the reason why I wrote in about it is
> because I have a table which I categorized as "small" but was, in
> fact, 1.5MB, which made transferring it somewhat slow over a remote
> link.  zlib compression takes it down to about 550K and lzo (similar,
> but not identical) 880K.  If we're curious how it affects replication
> traffic, I could probably gather statistics on LZO-compressed WAL
> traffic, of which we have a pretty huge amount captured.

there are plenty of on gpl lz based libraries out there (for example:
http://www.fastlz.org/) and always have been. they are all much
faster than zlib. the main issue is patents...you have to be careful
even though all the lz77/78 patents seem to have expired or apply to
specifics not relevant to general use.

see here for the last round of talks on this:
http://archives.postgresql.org/pgsql-performance/2009-08/msg00052.php

lzo is nearing its 20th birthday, so even if you are paranoid about
patents (admittedly, there is good reason to be), the window is
closing fast to have patent issues that aren't A expired or B covered
by prior art on that or the various copycat implementations, at least
in the US.

snappy looks amazing.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-03-14 20:30:50 Re: Syntax error and reserved keywords
Previous Message Jeff Janes 2012-03-14 19:29:37 Re: wal_buffers, redux