Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-16 19:31:01
Message-ID: CA+TgmoZ0p-ez_6unDsp4gG7Q75nGXYqDbcVKi9K6tPNE3cPOnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 2:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Oh, I guess it would make sense to do it that way. However, I was just
> thinking of doing the iteration over the tuples that ExecEvalRow() etc
> do inside heap_form_flattened_tuple() (or whatever). That'd not be any
> worse than what the patch is doing now, just less duplication, and an
> easier path towards optimizing it if we notice that we need to?

It's a question of whether you copy the datum array. I don't think a
generic function can assume that it's OK to scribble on the input
array, or if it does, that'd better be very prominently mentioned in
the comments. And copying into a new array has its own costs. 0002 is
based on the theory that scribbling on the executor's array won't
cause any problem, which I *think* is true, but isn't correct in all
cases (e.g. if the input data is coming from a slot). If we pass a
flag down to fill_val() and friends then we don't end up having to
copy the arrays over so the problem goes away in that design.

> The harder part would probably be to find a way to deal with the layers
> above, without undue code duplication. I think it's not just fill_val()
> that'd need to know, but also heap_compute_data_size(),
> heap_fill_tuple() - both of which are externally visible (and iirc thus
> not going to get inlined with many compiler options, due to symbol
> interposition dangers). But we could have a
> heap_compute_data_size_internal(bool flatten) that's called by
> heap_compute_data_size(). And something similar for heap_form_tuple().

Hmm, yeah, that's not great. I guess there's nothing expensive we need
to repeat - I think anyway - because we should be able to get the
uncompressed size from the TOAST pointer itself. But the code would
have to know to do that, as you say.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-03-16 19:31:42 Re: pg_amcheck contrib application
Previous Message Andrew Dunstan 2021-03-16 19:21:39 Re: pg_amcheck contrib application