Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-27 09:14:54
Message-ID: CAFiTN-sTcQq2iszeTxZum4x+0jK1x-PHfVDJqUAQCsCTBEeDQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 27, 2021 at 7:44 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> I just realized that there is one more function
> "heap_copy_tuple_as_datum" which is flattening the tuple based on the
> HeapTupleHasExternal check, so I think I will have to analyze the
> caller of this function as well and need to do a similar analysis,
> although there are just a few callers for this. And, I think the fix
> in ExecEvalConvertRowtype is wrong, we will have to do something for
> the compressed type here as well. I am not sure what is the best way
> to fix it because we are directly getting the input tuple so we can
> not put an optimization of dettoasting before forming the tuple. We
> might detoast in execute_attr_map_tuple, when the source and target
> row types are different because we are anyway deforming and processing
> each filed in that function but the problem is execute_attr_map_tuple
> is used at multiple places but for that, we can make another version
> of this function which actually detoast along with conversion and use
> that in ExecEvalConvertRowtype. But if there is no tuple conversion
> needed then we directly use heap_copy_tuple_as_datum and in that case,
> there is no deforming at all so maybe, in this case, we can not do
> anything but I think ExecEvalConvertRowtype should not be the very
> common path.

I have done further analysis for this, basically,
ExecEvalConvertRowtype can never have the compressed/external data
because it is converting from one composite type to another composite
type and while forming the composite type only we ensure that there
can not be any compressed/external data. Refer below comments in
ExecEvalConvertRowtype

/*
* The tuple is physically compatible as-is, but we need to insert the
* destination rowtype OID in its composite-datum header field, so we
* have to copy it anyway. heap_copy_tuple_as_datum() is convenient
* for this since it will both make the physical copy and insert the
* correct composite header fields. Note that we aren't expecting to
* have to flatten any toasted fields: the input was a composite
* datum, so it shouldn't contain any. So heap_copy_tuple_as_datum()
* is overkill here, but its check for external fields is cheap.
*/
*op->resvalue = heap_copy_tuple_as_datum(&tmptup, outdesc);

For heap_copy_tuple_as_datum, I have removed the external tuple check
and instead I have passed a parameter whether we need to flatten or
not. So the callers who are sure that they can not have any
compressed/external field should only pass false so that it will
completely skip the flattening path for those callers. But after
doing that in some of the callers especially
ExecFetchSlotHeapTupleDatum and
SPI_returntuple we will have to process the complete tuple when the
function's return type is tuple. I am not sure how to optimize this
because this is directly getting the tuple from the function. I am
not too much worried about the other callers like
PLyMapping_ToComposite, PLySequence_ToComposite and
PLyGenericObject_ToComposite because in these function we are forming
tuple from value before calling heap_copy_tuple_as_datum so if we
think these are performance critical paths then we have a way to
detoast even before forming the tuple.

Another function which I think can be problematic is
"expanded_record_set_tuple", because if we don't handle the compressed
types in this function then we can not go with the assumption that the
composite will never have compressed data. I am not completely sure
how much of a problem that can be? Maybe if we don't do anything here
then we might need to do something in ExecEvalConvertRowtype because
therein we assume that composite type can not contain compressed data
as well.

I have also reviewed the patch for the default compression method GUC
and made some changes.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v28-0004-default-to-with-lz4.patch text/x-patch 1.7 KB
v28-0001-Disallow-compressed-data-inside-container-types.patch text/x-patch 47.4 KB
v28-0003-Add-default_toast_compression-GUC.patch text/x-patch 12.2 KB
v28-0002-Built-in-compression-method.patch text/x-patch 111.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-02-27 10:05:53 Re: Tid scan improvements
Previous Message Amit Kapila 2021-02-27 09:05:37 Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.