Re: PostgreSQL 8.4 performance tuning questions

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PFC <lists(at)peufeu(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PostgreSQL 8.4 performance tuning questions
Date: 2009-08-05 17:00:20
Message-ID: C69F08B4.E2EC%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 8/5/09 7:12 AM, "Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote:

> On Tue, Aug 4, 2009 at 4:40 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Scott Carey <scott(at)richrelevance(dot)com> writes:
>>> There are a handful of other compression algorithms very similar to LZO in
>>> performance / compression level under various licenses.
>>> LZO is just the best known and most widely used.
>>
>> And after we get done with the license question, we need to ask about
>> patents.  The compression area is just a minefield of patents.  gzip is
>> known to avoid all older patents (and would be pretty solid prior art
>> against newer ones).  I'm far less confident about lesser-known systems.
>
> I did a little bit of research. LZO and friends are variants of LZW.
> The main LZW patent died in 2003, and AFAIK there has been no patent
> enforcement cases brought against LZO or it's cousins (LZO dates to
> 1996). OK, I'm no attorney, etc, but the internet seems to believe
> that the algorithms are patent free. LZO is quite widely used, in
> both open source and some relatively high profile commercial projects.
>

That doesn't sound right to me, LZW is patent protected in a few ways, and
is a LZ78 scheme.

LZO, zlib, and the others here are LZ77 schemes which avoid the LZW patents.
There are some other patents in the territory with respect to how the hash
lookups are done for the LZ77 'sliding window' approach. Most notably,
using a tree is patented, and a couple other (obvious) tricks that are
generally avoided anyway for any algorithms that are trying to be fast
rather than produce the highest compression.

http://en.wikipedia.org/wiki/Lossless_data_compression#Historical_legal_issu
es
http://en.wikipedia.org/wiki/LZ77_and_LZ78
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch
http://www.faqs.org/faqs/compression-faq/part1/section-7.html
http://www.ross.net/compression/patents.html

Note, US patents are either 17 years after grant, or 20 years after filing.
A very large chunk of those in this space have expired, but a few were
filed/granted in the early 90's -- though those are generally more specific
and easy to avoid. Or very obvious duplicates of previous patents.

More notably, one of these, if interpreted broadly, would apply to zlib as
well (Gibson and Graybill) but the patent mentions LZRW1, and any broader
scope would have prior art conflicts with ones that are now long expired.
Its 17 years after grant on that, but not 20 years after filing.

> I downloaded the libraries and did some tests.
> 2.5 G sql dump:
>
> compression time:
> zlib: 4m 1s
> lzo: 17s
> fastlz: 28.8s
> liblzf: 26.7s
>
> compression size:
> zlib: 609M 75%
> lzo: 948M 62%
> fastlz: 936M 62.5%
> liblzf: 916M 63.5%
>

Interesting how that conflicts with some other benchmarks out there (where
LZO ad the others are about the same). But, they're all an order of
magnitude faster than gzip/zlib.

> A couple of quick notes: liblzf produces (possibly) architecture
> dependent archives according to its header, and fastlz is not declared
> 'stable' according to its website.
>

> merlin
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Subbiah Stalin-XCGF84 2009-08-05 19:16:04 Re: Query help
Previous Message Merlin Moncure 2009-08-05 14:12:58 Re: PostgreSQL 8.4 performance tuning questions