Re: partitioning - changing a slot's descriptor is expensive

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: partitioning - changing a slot's descriptor is expensive
Date: 2018-06-27 05:55:58
Message-ID: 20180627055558.da3nktjtt4ic32a4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-27 14:46:26 +0900, Amit Langote wrote:
> Hi Andres,
>
> On 2018/06/27 14:09, Andres Freund wrote:
> > Hi,
> >
> > (sorry if I CCed the wrong folks, the code has changed a fair bit)
> >
> > I noticed that several places in the partitioning code look like:
> >
> > /*
> > * If the tuple has been routed, it's been converted to the
> > * partition's rowtype, which might differ from the root
> > * table's. We must convert it back to the root table's
> > * rowtype so that val_desc shown error message matches the
> > * input tuple.
> > */
> > if (resultRelInfo->ri_PartitionRoot)
> > {
> > TableTuple tuple = ExecFetchSlotTuple(slot);
> > TupleConversionMap *map;
> >
> > rel = resultRelInfo->ri_PartitionRoot;
> > tupdesc = RelationGetDescr(rel);
> > /* a reverse map */
> > map = convert_tuples_by_name(orig_tupdesc, tupdesc,
> > gettext_noop("could not convert row type"));
> > if (map != NULL)
> > {
> > tuple = do_convert_tuple(tuple, map);
> > ExecSetSlotDescriptor(slot, tupdesc);
> > ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> > }
> > }
>
> This particular code block (and a couple of others like it) are executed
> only if a tuple fails partition constraint check, so maybe not a very
> frequently executed piece of code.
>
> There is however similar code that runs in non-error paths, such as in
> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
> the root parent have differing attribute numbers. So, one might say that
> that code's there to handle the special cases which may not arise much in
> practice, but it may still be a good idea to make improvements if we can
> do so.

Yea, I was referring to all do_convert_tuple() callers, and some of them
are more hot-path than the specific one above. E.g. the
ConvertPartitionTupleSlot() call in CopyFrom().

> > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> > to reallocate the values/isnull arrays (and potentially do desc pinning
> > etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> > be a virtual slot. Calling heap_deform_tuple(), as done in
> > do_convert_tuple(), might be superfluous work, because the input tuple
> > might already be entirely deformed in the input slot.
> >
> > I think it'd be good to rewrite the code so there's an input and an
> > output slot that each will keep their slot descriptors set.
> >
> > Besides performance as the code stands, this'll also be important for
> > pluggable storage (as we'd otherwise get a *lot* of back/forth
> > conversions between tuple formats).
> >
> > Anybody interesting in writing a patch that redoes this?
>
> Thanks for the pointers above, I will see if I can produce a patch and
> will also be interested in hearing what others may have to say.

Cool.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2018-06-27 05:59:45 Re: unexpected relkind: 73 ERROR with partition table index
Previous Message Amit Kapila 2018-06-27 05:54:11 Re: Thinko/typo in ExecSimpleRelationInsert