Re: partitioning - changing a slot's descriptor is expensive

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Andres Freund <andres(at)anarazel(dot)de>, Amit Langote <amitlangote09(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: partitioning - changing a slot's descriptor is expensive
Date: 2018-06-27 05:46:26
Message-ID: 9f307f38-e04d-2271-9500-b7424c5d86d8@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

On 2018/06/27 14:09, Andres Freund wrote:
> Hi,
>
> (sorry if I CCed the wrong folks, the code has changed a fair bit)
>
> I noticed that several places in the partitioning code look like:
>
> /*
> * If the tuple has been routed, it's been converted to the
> * partition's rowtype, which might differ from the root
> * table's. We must convert it back to the root table's
> * rowtype so that val_desc shown error message matches the
> * input tuple.
> */
> if (resultRelInfo->ri_PartitionRoot)
> {
> TableTuple tuple = ExecFetchSlotTuple(slot);
> TupleConversionMap *map;
>
> rel = resultRelInfo->ri_PartitionRoot;
> tupdesc = RelationGetDescr(rel);
> /* a reverse map */
> map = convert_tuples_by_name(orig_tupdesc, tupdesc,
> gettext_noop("could not convert row type"));
> if (map != NULL)
> {
> tuple = do_convert_tuple(tuple, map);
> ExecSetSlotDescriptor(slot, tupdesc);
> ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> }
> }

This particular code block (and a couple of others like it) are executed
only if a tuple fails partition constraint check, so maybe not a very
frequently executed piece of code.

There is however similar code that runs in non-error paths, such as in
ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
the root parent have differing attribute numbers. So, one might say that
that code's there to handle the special cases which may not arise much in
practice, but it may still be a good idea to make improvements if we can
do so.

> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> to reallocate the values/isnull arrays (and potentially do desc pinning
> etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> be a virtual slot. Calling heap_deform_tuple(), as done in
> do_convert_tuple(), might be superfluous work, because the input tuple
> might already be entirely deformed in the input slot.
>
> I think it'd be good to rewrite the code so there's an input and an
> output slot that each will keep their slot descriptors set.
>
> Besides performance as the code stands, this'll also be important for
> pluggable storage (as we'd otherwise get a *lot* of back/forth
> conversions between tuple formats).
>
> Anybody interesting in writing a patch that redoes this?

Thanks for the pointers above, I will see if I can produce a patch and
will also be interested in hearing what others may have to say.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-06-27 05:54:11 Re: Thinko/typo in ExecSimpleRelationInsert
Previous Message Thomas Munro 2018-06-27 05:26:39 Re: Excessive PostmasterIsAlive calls slow down WAL redo