Re: partition routing layering in nodeModifyTable.c

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, amitdkhan(dot)pg(at)gmail(dot)com
Subject: Re: partition routing layering in nodeModifyTable.c
Date: 2020-10-07 09:50:54
Message-ID: CA+HiwqGAVa9J9hNLH7tVCs13UqJMP-9RU_T5uAWEAfsg8VEXGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hekki,

Thanks a lot for the review!

On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 13/07/2020 08:47, Amit Langote wrote:
> > It's been over 11 months since there was any significant commentary on
> > the contents of the patches themselves, so perhaps I should reiterate
> > what the patches are about and why it might still be a good idea to
> > consider them.
> >
> > The thread started with some very valid criticism of the way
> > executor's partition tuple routing logic looks randomly sprinkled over
> > in nodeModifyTable.c, execPartition.c. In the process of making it
> > look less random, we decided to get rid of the global variable
> > es_result_relation_info to avoid complex maneuvers of
> > setting/resetting it correctly when performing partition tuple
> > routing, causing some other churn beside the partitioning code. Same
> > with another global variable TransitionCaptureState.tcs_map. So, the
> > patches neither add any new capabilities, nor improve performance, but
> > they do make the code in this area a bit easier to follow.
> >
> > Actually, there is a problem that some of the changes here conflict
> > with patches being discussed on other threads ([1], [2]), so much so
> > that I decided to absorb some changes here into another "refactoring"
> > patch that I have posted at [2].
>
> Thanks for the summary. It's been a bit hard to follow what depends on
> what across these threads, and how they work together. It seems that
> this patch set is the best place to start.

Great. I'd be happy if I will have one less set of patches to keep at home. :-)

> > Attached rebased patches.
> >
> > 0001 contains preparatory FDW API changes to stop relying on
> > es_result_relation_info being set correctly.
>
> Makes sense. The only thing I don't like about this is the way the
> ForeignScan->resultRelIndex field is set. make_foreignscan() initializes
> it to -1, and the FDW's PlanDirectModify() function is expected to set
> it, like you did in postgres_fdw:
>
> > @@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
> > rebuild_fdw_scan_tlist(fscan, returningList);
> > }
> >
> > + /*
> > + * Set the index of the subplan result rel.
> > + */
> > + fscan->resultRelIndex = subplan_index;
> > +
> > table_close(rel, NoLock);
> > return true;
> > }
>
> It has to be set to that value (subplan_index is an argument to
> PlanDirectModify()), the FDW doesn't have any choice there, so this is
> just additional boilerplate code that has to be copied to every FDW that
> implements direct modify. Furthermore, if the FDW doesn't set it
> correctly, you could have some very interesting results, like updating
> wrong table. It would be better to set it in make_modifytable(), just
> after calling PlanDirectModify().

Actually, that's how it was done in earlier iterations but I think I
decided to move that into the FDW's functions due to some concern of
one of the other patches that depended on this patch. Maybe it makes
sense to bring that back into make_modifytable() and worry about the
other patch later.

> I'm also a bit unhappy with the way it's updated in set_plan_refs():
>
> > --- a/src/backend/optimizer/plan/setrefs.c
> > +++ b/src/backend/optimizer/plan/setrefs.c
> > @@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
> > rc->rti += rtoffset;
> > rc->prti += rtoffset;
> > }
> > + /*
> > + * Caution: Do not change the relative ordering of this loop
> > + * and the statement below that adds the result relations to
> > + * root->glob->resultRelations, because we need to use the
> > + * current value of list_length(root->glob->resultRelations)
> > + * in some plans.
> > + */
> > foreach(l, splan->plans)
> > {
> > lfirst(l) = set_plan_refs(root,
> > @@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root,
> > }
> >
> > fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
> > +
> > + /*
> > + * Adjust resultRelIndex if it's valid (note that we are called before
> > + * adding the RT indexes of ModifyTable result relations to the global
> > + * list)
> > + */
> > + if (fscan->resultRelIndex >= 0)
> > + fscan->resultRelIndex += list_length(root->glob->resultRelations);
> > }
> >
> > /*
>
> That "Caution" comment is well deserved, but could we make this more
> robust to begin with? The most straightforward solution would be to pass
> down the "current resultRelIndex" as an extra parameter to
> set_plan_refs(), similar to rtoffset. If we did that, we wouldn't
> actually need to set it before setrefs.c processing at all.

Hmm, I don't think I understand the last sentence. A given
ForeignScan node's resultRelIndex will have to be set before getting
to set_plan_refs(). I mean we shouldn't be making it a job of
setrefs.c to figure out which ForeignScan nodes need to have its
resultRelIndex set to a valid value.

> I'm a bit wary of adding another argument to set_plan_refs() because
> that's a lot of code churn, but it does seem like the most natural
> solution to me. Maybe create a new context struct to hold the
> PlannerInfo, rtoffset, and the new "currentResultRelIndex" value,
> similar to fix_scan_expr_context, to avoid passing through so many
> arguments.

I like the idea of a context struct. I've implemented it as a
separate refactoring patch (0001) and 0002 (what was before 0001)
extends it for "current ResultRelIndex", although I used the name
rroffset for "current ResultRelIndex" to go along with rtoffset.

> Another idea is to merge "resultRelIndex" and a "range table index" into
> one value. Range table entries that are updated would have a
> ResultRelInfo, others would not. I'm not sure if that would end up being
> cleaner or messier than what we have now, but might be worth trying.

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

> > 0002 removes es_result_relation_info in favor passing the active
> > result relation around as a parameter in the various functions that
> > need it
>
> Looks good.
>
> > 0003 Moves UPDATE tuple-routing logic into a new function
> >
> > 0004 removes the global variable TransitionCaptureState.tcs_map which
> > needed to be set/reset whenever the active result relation relation
> > changes in favor of a new field in ResultRelInfo to store the same map
>
> I didn't look closely, but these make sense at a quick glance.

Updated patches attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v12-0001-Refactor-set_plan_refs.patch application/octet-stream 8.4 KB
v12-0005-Refactor-transition-tuple-capture-code-a-bit.patch application/octet-stream 20.5 KB
v12-0002-Include-result-relation-index-if-any-in-ForeignS.patch application/octet-stream 14.3 KB
v12-0004-Rearrange-partition-update-row-movement-code-a-b.patch application/octet-stream 16.0 KB
v12-0003-Remove-es_result_relation_info.patch application/octet-stream 40.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-10-07 10:12:04 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Michael Banck 2020-10-07 09:18:14 Re: Add a log message on recovery startup before syncing datadir