Re: [HACKERS] Custom compression methods

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2017-12-13 18:34:49
Message-ID: 3cc48e8f-8d07-b5f9-ff13-dcfd3e0ec169@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/13/2017 05:55 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> On 12/13/2017 01:54 AM, Robert Haas wrote:
>
>>> 3. Compression is only applied to large-ish values. If you are just
>>> making the data type representation more compact, you probably want to
>>> apply the new representation to all values. If you are compressing in
>>> the sense that the original data gets smaller but harder to interpret,
>>> then you probably only want to apply the technique where the value is
>>> already pretty wide, and maybe respect the user's configured storage
>>> attributes. TOAST knows about some of that kind of stuff.
>>
>> Good point. One such parameter that I really miss is compression level.
>> I can imagine tuning it through CREATE COMPRESSION METHOD, but it does
>> not seem quite possible with compression happening in a datatype.
>
> Hmm, actually isn't that the sort of thing that you would tweak using a
> column-level option instead of a compression method?
> ALTER TABLE ALTER COLUMN SET (compression_level=123)
> The only thing we need for this is to make tuptoaster.c aware of the
> need to check for a parameter.
>

Wouldn't that require some universal compression level, shared by all
supported compression algorithms? I don't think there is such thing.

Defining it should not be extremely difficult, although I'm sure there
will be some cumbersome cases. For example what if an algorithm "a"
supports compression levels 0-10, and algorithm "b" only supports 0-3?

You may define 11 "universal" compression levels, and map the four
levels for "b" to that (how). But then everyone has to understand how
that "universal" mapping is defined.

Another issue is that there are algorithms without a compression level
(e.g. pglz does not have one, AFAICS), or with somewhat definition (lz4
does not have levels, and instead has "acceleration" which may be an
arbitrary positive integer, so not really compatible with "universal"
compression level).

So to me the

ALTER TABLE ALTER COLUMN SET (compression_level=123)

seems more like an unnecessary hurdle ...

>>> I don't think TOAST needs to be entirely transparent for the
>>> datatypes. We've already dipped our toe in the water by allowing some
>>> operations on "short" varlenas, and there's really nothing to prevent
>>> a given datatype from going further. The OID problem you mentioned
>>> would presumably be solved by hard-coding the OIDs for any built-in,
>>> privileged compression methods.
>>
>> Stupid question, but what do you mean by "short" varlenas?
>
> Those are varlenas with 1-byte header rather than the standard 4-byte
> header.
>

OK, that's what I thought. But that is still pretty transparent to the
data types, no?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-12-13 19:02:44 Re: [HACKERS] A design for amcheck heapam verification
Previous Message Andres Freund 2017-12-13 17:35:24 Re: pgsql: Provide overflow safe integer math inline functions.