Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-03 21:19:13
Message-ID: CA+TgmoZx4kTi4H4yvmtshhG7drG18GS1fOL1FhE4v+34WURpSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Does this patch need to do something about ExtractReplicaIdentity()?
If there are compressed fields in the tuple being built, can we rely
on the decompression engine being available at the time we need to do
something with the tuple?

More generally, I think it would be good to divide up 0001 into at
least 3 parts:

- The first part would invent HeapTupleGetRawDatum() and
HeapTupleHeaderGetRawDatum() and use them in place of the existing
functions everywhere that it's safe to do so. The current patch just
switches to using PointerGetDatum() but I think we should instead add
something like static inline Datum
HeapTupleHeaderGetRawDatum(HeapTupleHeader tuple) {
Assert(!HeapTupleHeaderHasExternal(tup)); return
PointerGetDatum(tuple); } This provides some type safety while being
just as fast as a direct use of PointerGetDatum() in optimized code. I
think that the Assert will have to be ripped out if we proceed with
the other patches, but if we can have it at this stage, so much the
better. I think this patch should also invent
PG_RETURN_HEAPTUPLEHEADER_RAW and likewise use that where appropriate
- including, I think, in place of cases that are now using
PG_RETURN_DATUM(HeapTupleGetDatum(...)). All of these changes make
sense from an efficiency standpoint apart from any possible
definitional changes.

- The second part would optimize code that the first part cannot
safely convert to use the "raw" versions. For example, the changes to
ExecEvalRow() can go in this part. You can view this part as getting
rid of calls to HeapTupleGetDatum(), HeapTupleHeaderGetDatum(), and/or
PG_RETURN_HEAPTUPLE() that couldn't be changed categorically, but can
be changed if we make some other code changes. These changes, too, can
potentially be justified on performance grounds independently of
anything else.

- Then we could maybe have some more patches that make other kinds of
preparatory changes. I'm not too sure exactly what should go in here,
or whether it should be 1 patch or several or maybe 0. But if there's
preparatory stuff that's potentially separately committable and not
the same as the stuff above, then it should go into patches here.

- The last patch would actually change the rule for composite datums.

One advantage of this is that if we have to revert the last patch for
some reason we are not ripping the entire thing, churning the code
base and widely-used APIs for everyone. Another advantage is that
getting those first two patches committed or even just applied locally
on a branch would, at least IMHO, make it a lot simpler to see what
potential problem spots remain - and by "problem" I mean mostly from a
performance point of view.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Rofail 2021-03-03 21:31:49 Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message Mats Kindahl 2021-03-03 21:15:18 Re: Table AM and DDLs