Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-04 10:33:26
Message-ID: CAFiTN-s4YJaHReVr=S+1QCmKiH1u_K+QSaP9prpGoH1yuAz-uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 4, 2021 at 2:49 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Hi,
>
> Does this patch need to do something about ExtractReplicaIdentity()?
> If there are compressed fields in the tuple being built, can we rely
> on the decompression engine being available at the time we need to do
> something with the tuple?

We log the replica identity tuple in the WAL so that later walsender
can stream this to the subscriber, and before sending to the
subscriber anyway we have detoast all the data. Said that I think the
problem you are worried about is not only with 'replica identity
tuple' but it is with any tuple. I mean we copy the compressed field
as it is in WAL and suppose we copy some fields which are compressed
with lz4 and then we restart the server with another binary that is
compiled without lz4. Now, the problem is the walsender can not
decompress those data.

> More generally, I think it would be good to divide up 0001 into at
> least 3 parts:
>
> - The first part would invent HeapTupleGetRawDatum() and
> HeapTupleHeaderGetRawDatum() and use them in place of the existing
> functions everywhere that it's safe to do so. The current patch just
> switches to using PointerGetDatum() but I think we should instead add
> something like static inline Datum
> HeapTupleHeaderGetRawDatum(HeapTupleHeader tuple) {
> Assert(!HeapTupleHeaderHasExternal(tup)); return
> PointerGetDatum(tuple); } This provides some type safety while being
> just as fast as a direct use of PointerGetDatum() in optimized code. I
> think that the Assert will have to be ripped out if we proceed with
> the other patches, but if we can have it at this stage, so much the
> better. I think this patch should also invent
> PG_RETURN_HEAPTUPLEHEADER_RAW and likewise use that where appropriate
> - including, I think, in place of cases that are now using
> PG_RETURN_DATUM(HeapTupleGetDatum(...)). All of these changes make
> sense from an efficiency standpoint apart from any possible
> definitional changes.
>
> - The second part would optimize code that the first part cannot
> safely convert to use the "raw" versions. For example, the changes to
> ExecEvalRow() can go in this part. You can view this part as getting
> rid of calls to HeapTupleGetDatum(), HeapTupleHeaderGetDatum(), and/or
> PG_RETURN_HEAPTUPLE() that couldn't be changed categorically, but can
> be changed if we make some other code changes. These changes, too, can
> potentially be justified on performance grounds independently of
> anything else.
>
> - Then we could maybe have some more patches that make other kinds of
> preparatory changes. I'm not too sure exactly what should go in here,
> or whether it should be 1 patch or several or maybe 0. But if there's
> preparatory stuff that's potentially separately committable and not
> the same as the stuff above, then it should go into patches here.
>
> - The last patch would actually change the rule for composite datums.
>
> One advantage of this is that if we have to revert the last patch for
> some reason we are not ripping the entire thing, churning the code
> base and widely-used APIs for everyone. Another advantage is that
> getting those first two patches committed or even just applied locally
> on a branch would, at least IMHO, make it a lot simpler to see what
> potential problem spots remain - and by "problem" I mean mostly from a
> performance point of view.

Okay, I will work on this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-04 10:38:08 Re: Improvements and additions to COPY progress reporting
Previous Message Pavel Stehule 2021-03-04 10:28:16 Re: proposal - psql - use pager for \watch command