Re: [HACKERS] Custom compression methods

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
Subject: Re: [HACKERS] Custom compression methods
Date: 2017-12-02 21:53:10
Message-ID: c1b3880a-7a4c-c5bd-7be8-beec0c8684b8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/02/2017 09:38 PM, Andres Freund wrote:
> Hi,
>
> On 2017-12-02 16:04:52 +0100, Tomas Vondra wrote:
>> Firstly, it's going to be quite hard (or perhaps impossible) to find an
>> algorithm that is "universally better" than pglz. Some algorithms do
>> work better for text documents, some for binary blobs, etc. I don't
>> think there's a win-win option.
>
> lz4 is pretty much there.
>

That's a matter of opinion, I guess. It's a solid compression algorithm,
that's for sure ...

>> Secondly, all the previous attempts ran into some legal issues, i.e.
>> licensing and/or patents. Maybe the situation changed since then (no
>> idea, haven't looked into that), but in the past the "pluggable"
>> approach was proposed as a way to address this.
>
> Those were pretty bogus.

IANAL so I don't dare to judge on bogusness of such claims. I assume if
we made it optional (e.g. configure/initdb option, it'd be much less of
an issue). Of course, that has disadvantages too (because when you
compile/init with one algorithm, and then find something else would work
better for your data, you have to start from scratch).

>
> I think we're not doing our users a favor if they've to download
> some external projects, then fiddle with things, just to not choose
> a compression algorithm that's been known bad for at least 5+ years.
> If we've a decent algorithm in-core *and* then allow extensibility,
> that's one thing, but keeping the bad and tell forks "please take
> our users with this code we give you" is ...
>

I don't understand what exactly is your issue with external projects,
TBH. I think extensibility is one of the great strengths of Postgres.
It's not all rainbows and unicorns, of course, and it has costs too.

FWIW I don't think pglz is a "known bad" algorithm. Perhaps there are
cases where other algorithms (e.g. lz4) are running circles around it,
particularly when it comes to decompression speed, but I wouldn't say
it's "known bad".

Not sure which forks you're talking about ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-02 23:44:50 Re: Bitmap scan is undercosted?
Previous Message legrand legrand 2017-12-02 21:33:08 Re: Partition pruning for Star Schema