Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-16 14:27:12
Message-ID: CA+TgmoYKuGxkLj6JYHC6RxZ+e173yyX_4Nx6jZupkCL5_x_=VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 15, 2021 at 6:58 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I don't particularly like PG_RETURN_HEAPTUPLEHEADER_RAW(). What is "raw"
> about it? It also seems to me like there needs to at least be a
> sentence or two explaining when to use which of the functions.

It seemed like the natural name to me; we use "raw" elsewhere to mean
that fewer things are magically addressed on behalf of the caller,
e.g. HeapTupleHeaderGetRawXmin. I'm open to suggestions, however.

> I think heap_copy_tuple_as_raw_datum() should grow an assert checking
> there are no external columns?

Yeah, could be done.

> I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
> contain a nearly identical copy of the same code. And
> make_tuple_from_row() also is similar. It seem that there should be a
> heap_form_tuple() version doing this for us?

I was worried about having either a performance impact or code
duplication. The actual plan where you could insert this organically
is in fill_val(), which is called from heap_fill_tuple(), which is
called from heap_form_tuple(). If you don't mind passing down 'int
flags' or similar to all those, and having additional branches to make
the behavior dependent on the flags, I'm cool with it. Or if you think
we should template-ize all those functions, that'd be another way to
go. But I was afraid I would get complaints about adding overhead to
hot code paths.

> > I'm open to being convinced that we don't need to do either of these
> > things, and that the cost of iterating over all varlenas in the tuple
> > is not so bad as to preclude doing things as you have them here. But,
> > I'm afraid it's going to be too expensive.
>
> I mean, I would just define several of those places away by not caring
> about tuples in a different compressino formation ending up in a
> table...

That behavior feels unacceptable to me from a user expectations point
of view. I think there's an argument that if I update a tuple that
contains a compressed datum, and I don't update that particular
column, I think it would be OK to not recompress the column. But, if I
insert data into a table, I as a user would expect that the
compression settings for that column are going to be respected.
Deciding that's optional because we don't have a good way of making it
fast seems like a major cop-out, at least to me. I think from a user
perspective you don't expect INSERT INTO .. SELECT FROM to create a
different final state than a dump and reload, and that if we deviate
from that people are gonna be unhappy. I could be wrong; maybe it's
only me who would be unhappy.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-03-16 14:52:18 re: crash during cascaded foreign key update
Previous Message Tom Lane 2021-03-16 14:17:46 Re: crash during cascaded foreign key update