Re: [HACKERS] Custom compression methods

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-16 18:54:55
Message-ID: 20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-16 10:27:12 -0400, Robert Haas wrote:
> > I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
> > contain a nearly identical copy of the same code. And
> > make_tuple_from_row() also is similar. It seem that there should be a
> > heap_form_tuple() version doing this for us?
>
> I was worried about having either a performance impact or code
> duplication. The actual plan where you could insert this organically
> is in fill_val(), which is called from heap_fill_tuple(), which is
> called from heap_form_tuple().

Oh, I guess it would make sense to do it that way. However, I was just
thinking of doing the iteration over the tuples that ExecEvalRow() etc
do inside heap_form_flattened_tuple() (or whatever). That'd not be any
worse than what the patch is doing now, just less duplication, and an
easier path towards optimizing it if we notice that we need to?

> If you don't mind passing down 'int flags' or similar to all those,
> and having additional branches to make the behavior dependent on the
> flags, I'm cool with it. Or if you think we should template-ize all
> those functions, that'd be another way to go. But I was afraid I would
> get complaints about adding overhead to hot code paths.

An option for fill_val() itself would probably be fine. It's already an
inline, and if it doesn't get inlined, we could force the compilers hand
with pg_attribute_always_inline.

The harder part would probably be to find a way to deal with the layers
above, without undue code duplication. I think it's not just fill_val()
that'd need to know, but also heap_compute_data_size(),
heap_fill_tuple() - both of which are externally visible (and iirc thus
not going to get inlined with many compiler options, due to symbol
interposition dangers). But we could have a
heap_compute_data_size_internal(bool flatten) that's called by
heap_compute_data_size(). And something similar for heap_form_tuple().

But that's complicated, so I'd just go with the iteration in a
heap_form_tuple() wrapper for now.

> > > I'm open to being convinced that we don't need to do either of these
> > > things, and that the cost of iterating over all varlenas in the tuple
> > > is not so bad as to preclude doing things as you have them here. But,
> > > I'm afraid it's going to be too expensive.
> >
> > I mean, I would just define several of those places away by not caring
> > about tuples in a different compressino formation ending up in a
> > table...
>
> That behavior feels unacceptable to me from a user expectations point
> of view. I think there's an argument that if I update a tuple that
> contains a compressed datum, and I don't update that particular
> column, I think it would be OK to not recompress the column. But, if I
> insert data into a table, I as a user would expect that the
> compression settings for that column are going to be respected.

IDK. The user might also expect that INSERT .. SELECT is fast, instead
of doing expensive decompression + compression (with pglz the former can
be really slow). I think there's a good argument for having an explicit
"recompress" operation, but I'm not convincd that doing things
implicitly is good, especially if it causes complications in quite a few
places.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-16 19:08:39 Re: shared-memory based stats collector
Previous Message Tom Lane 2021-03-16 18:52:22 Re: pg_amcheck contrib application