Re: pluggable compression support

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: pluggable compression support
Date: 2013-06-25 18:42:30
Message-ID: 20130625184230.GH7716@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-06-25 12:22:31 -0400, Robert Haas wrote:
> On Thu, Jun 20, 2013 at 8:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-06-15 12:20:28 +0200, Andres Freund wrote:
> >> On 2013-06-14 21:56:52 -0400, Robert Haas wrote:
> >> > I don't think we need it. I think what we need is to decide is which
> >> > algorithm is legally OK to use. And then put it in.
> >> >
> >> > In the past, we've had a great deal of speculation about that legal
> >> > question from people who are not lawyers. Maybe it would be valuable
> >> > to get some opinions from people who ARE lawyers. Tom and Heikki both
> >> > work for real big companies which, I'm guessing, have substantial
> >> > legal departments; perhaps they could pursue getting the algorithms of
> >> > possible interest vetted. Or, I could try to find out whether it's
> >> > possible do something similar through EnterpriseDB.
> >>
> >> I personally don't think the legal arguments holds all that much water
> >> for snappy and lz4. But then the opinion of a european non-lawyer doesn't
> >> hold much either.
> >> Both are widely used by a large number open and closed projects, some of
> >> which have patent grant clauses in their licenses. E.g. hadoop,
> >> cassandra use lz4, and I'd be surprised if the companies behind those
> >> have opened themselves to litigation.
> >>
> >> I think we should preliminarily decide which algorithm to use before we
> >> get lawyers involved. I'd surprised if they can make such a analysis
> >> faster than we can rule out one of them via benchmarks.
> >>
> >> Will post an updated patch that includes lz4 as well.
> >
> > Attached.
>
> Well, the performance of both snappy and lz4 seems to be significantly
> better than pglz. On these tests lz4 has a small edge but that might
> not be true on other data sets.

From what I've seen of independent benchmarks on more varying datasets
and from what I tested (without pg inbetween) lz4 usually has a bigger
margin than this, especially on decompression.
The implementation also seems to be better prepared to run on more
platforms, e.g. it didn't require any fiddling with endian.h in contrast
to snappy.
But yes, "even" snappy would be a big improvement should lz4 turn out to
be problematic and the performance difference isn't big enough to rule
one out as I'd hopped.

> I still think the main issue is legal
> review: are there any license or patent concerns about including
> either of these algorithms in PG? If neither of them have issues, we
> might need to experiment a little more before picking between them.
> If one does and the other does not, well, then it's a short
> conversation.

True. So, how do we proceed on that?

The ASF decided it was safe to use lz4 in cassandra. Does anybody have
contacts over there?

Btw, I have the feeling we hold this topic to a higher standard wrt
patent issues than other work in postgres...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2013-06-25 18:42:34 Re: Kudos for Reviewers -- straw poll
Previous Message Andrew Dunstan 2013-06-25 18:27:50 Re: Kudos for Reviewers -- straw poll