Re: partition routing layering in nodeModifyTable.c

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, amitdkhan(dot)pg(at)gmail(dot)com
Subject: Re: partition routing layering in nodeModifyTable.c
Date: 2019-08-05 09:16:10
Message-ID: CA+HiwqFhgubTo32xF0ca_-N4-UWNZ_YBQi_kLnAdgmhZTTBLuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres, Fujita-san,

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-07-31 17:04:38 +0900, Amit Langote wrote:
> > I looked into trying to do the things I mentioned above and it seems
> > to me that revising BeginDirectModify()'s API to receive the
> > ResultRelInfo directly as Andres suggested might be the best way
> > forward. I've implemented that in the attached 0001. Patches that
> > were previously 0001, 0002, and 0003 are now 0002, 003, and 0004,
> > respectively. 0002 is now a patch to "remove"
> > es_result_relation_info.
>
> Thanks! Some minor quibbles aside, the non FDW patches look good to me.
>
> Fujita-san, do you have any comments on the FDW API change? Or anybody
> else?

Based on the discussion, I have updated the patch.

> I'm a bit woried about the move of BeginDirectModify() into
> nodeModifyTable.c - it just seems like an odd control flow to me. Not
> allowing any intermittent nodes between ForeignScan and ModifyTable also
> seems like an undesirable restriction for the future. I realize that we
> already do that for BeginForeignModify() (just btw, that already accepts
> resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
> makes sense), but it still seems like the wrong direction to me.
>
> The need for that move, I assume, comes from needing knowing the correct
> ResultRelInfo, correct? I wonder if we shouldn't instead determine the
> at plan time (in setrefs.c), somewhat similar to how we determine
> ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

The patch adds a resultRelIndex field to ForeignScan node, which is
set to >= 0 value for non-SELECT queries. I first thought to set it
only if direct modification is being used, but maybe it'd be simpler
to set it even if direct modification is not used. To set it, the
patch teaches set_plan_refs() to initialize resultRelIndex of
ForeignScan plans that appear under ModifyTable. Fujita-san said he
plans to revise the planning of direct-modification style queries to
not require a ModifyTable node anymore, but maybe he'll just need to
add similar code elsewhere but not outside setrefs.c.

> Then we could just have BeginForeignModify, BeginDirectModify,
> BeginForeignScan all be called from ExecInitForeignScan().

I too think that it would've been great if we could call both
BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but
the former's API seems to be designed to be called from
ExecInitModifyTable from the get-go. Maybe we should leave that
as-is?

> Path 04 is such a nice improvement. Besides getting rid of a substantial
> amount of code, it also makes the control flow a lot easier to read.

Thanks.

> > @@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
> > * If there are no triggers in 'trigdesc' that request relevant transition
> > * tables, then return NULL.
> > *
> > - * The resulting object can be passed to the ExecAR* functions. The caller
> > - * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
> > - * with child tables.
> > + * The resulting object can be passed to the ExecAR* functions.
> > *
> > * Note that we copy the flags from a parent table into this struct (rather
> > * than subsequently using the relation's TriggerDesc directly) so that we can
> > @@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
> > */
> > if (row_trigger && transition_capture != NULL)
> > {
> > - TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
> > - TupleConversionMap *map = transition_capture->tcs_map;
> > + TupleTableSlot *original_insert_tuple;
> > + PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
> > + TupleConversionMap *map = pinfo ?
> > + pinfo->pi_PartitionToRootMap :
> > + relinfo->ri_ChildToRootMap;
> > bool delete_old_table = transition_capture->tcs_delete_old_table;
> > bool update_old_table = transition_capture->tcs_update_old_table;
> > bool update_new_table = transition_capture->tcs_update_new_table;
> > bool insert_new_table = transition_capture->tcs_insert_new_table;
> >
> > /*
> > + * Get the originally inserted tuple from the global variable and set
> > + * the latter to NULL because any given tuple must be read only once.
> > + * Note that the TransitionCaptureState is shared across many calls
> > + * to this function.
> > + */
> > + original_insert_tuple = transition_capture->tcs_original_insert_tuple;
> > + transition_capture->tcs_original_insert_tuple = NULL;
>
> Maybe I'm missing something, but original_insert_tuple is not a global
> variable?

I really meant to refer to the fact that it's maintained in a
ModifyTable-global struct. I've updated this comment a bit.

> > @@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
> > PartitionTupleRouting *proute,
> > PartitionDispatch dispatch,
> > ResultRelInfo *partRelInfo,
> > - int partidx)
> > + int partidx,
> > + bool is_update_result_rel)
> > {
> > MemoryContext oldcxt;
> > PartitionRoutingInfo *partrouteinfo;
> > @@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
> > if (mtstate &&
> > (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
> > {
> > - partrouteinfo->pi_PartitionToRootMap =
> > - convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
> > - RelationGetDescr(partRelInfo->ri_PartitionRoot),
> > - gettext_noop("could not convert row type"));
> > + /* If partition is an update target, then we already got the map. */
> > + if (is_update_result_rel)
> > + partrouteinfo->pi_PartitionToRootMap =
> > + partRelInfo->ri_ChildToRootMap;
> > + else
> > + partrouteinfo->pi_PartitionToRootMap =
> > + convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
> > + RelationGetDescr(partRelInfo->ri_PartitionRoot),
> > + gettext_noop("could not convert row type"));
> > }
>
> Hm, isn't is_update_result_rel just ModifyTable->operation == CMD_UPDATE?

No. The operation being CMD_UPDATE doesn't mean that the
ResultRelInfo that is passed to ExecInitRoutingInfo() is an UPDATE
result rel. It could be a ResultRelInfo built by ExecFindPartition()
when a row needed to be moved into a partition that is not present in
the UPDATE result rels contained in ModifyTableState. Though I
realized that we don't really need to add a new parameter to figure
that out. Looking at ri_RangeTableIndex property of the passed-in
ResultRelInfo is enough to distinguish the two types of
ResultRelInfos. I've updated the patch that way.

I found more dead code related to transition capture setup, which I've
removed in the latest 0004. For example, the
mt_per_subplan_tupconv_maps array and the code in nodeModifyTable.c
that was used to initialize it.

Attached updated patches.

Thanks,
Amit

Attachment Content-Type Size
v4-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patch application/octet-stream 11.0 KB
v4-0002-Remove-es_result_relation_info.patch application/octet-stream 36.0 KB
v4-0004-Refactor-transition-tuple-capture-code-a-bit.patch application/octet-stream 20.8 KB
v4-0003-Rearrange-partition-update-row-movement-code-a-bi.patch application/octet-stream 16.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Floris Van Nee 2019-08-05 10:05:39 Re: Index Skip Scan
Previous Message Thomas Munro 2019-08-05 08:58:05 Re: Remove HeapTuple and Buffer dependency for predicate locking functions