Re: [HACKERS] Custom compression methods

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2017-12-14 17:23:30
Message-ID: f86f1a2f-9510-e712-2a53-bfc17f6d2414@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/14/2017 04:21 PM, Robert Haas wrote:
> On Wed, Dec 13, 2017 at 5:10 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> 2. If several data types can benefit from a similar approach, it has
>>> to be separately implemented for each one.
>>
>> I don't think the current solution improves that, though. If you
>> want to exploit internal features of individual data types, it
>> pretty much requires code customized to every such data type.
>>
>> For example you can't take the tsvector compression and just slap
>> it on tsquery, because it relies on knowledge of internal tsvector
>> structure. So you need separate implementations anyway.
>
> I don't think that's necessarily true. Certainly, it's true that
> *if* tsvector compression depends on knowledge of internal tsvector
> structure, *then* that you can't use the implementation for anything
> else (this, by the way, means that there needs to be some way for a
> compression method to reject being applied to a column of a data
> type it doesn't like).

I believe such dependency (on implementation details) is pretty much the
main benefit of datatype-aware compression methods. If you don't rely on
such assumption, then I'd say it's a general-purpose compression method.

> However, it seems possible to imagine compression algorithms that can
> work for a variety of data types, too. There might be a compression
> algorithm that is theoretically a general-purpose algorithm but has
> features which are particularly well-suited to, say, JSON or XML
> data, because it looks for word boundaries to decide on what strings
> to insert into the compression dictionary.
>

Can you give an example of such algorithm? Because I haven't seen such
example, and I find arguments based on hypothetical compression methods
somewhat suspicious.

FWIW I'm not against considering such compression methods, but OTOH it
may not be such a great primary use case to drive the overall design.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2017-12-14 17:33:22 Re: procedures and plpgsql PERFORM
Previous Message Pavel Stehule 2017-12-14 16:46:47 Re: procedures and plpgsql PERFORM