Re: partition routing layering in nodeModifyTable.c

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, amitdkhan(dot)pg(at)gmail(dot)com
Subject: Re: partition routing layering in nodeModifyTable.c
Date: 2020-10-05 15:45:15
Message-ID: 73fe6ce6-d2a1-bd4b-17cf-9deba4e91467@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13/07/2020 08:47, Amit Langote wrote:
> It's been over 11 months since there was any significant commentary on
> the contents of the patches themselves, so perhaps I should reiterate
> what the patches are about and why it might still be a good idea to
> consider them.
>
> The thread started with some very valid criticism of the way
> executor's partition tuple routing logic looks randomly sprinkled over
> in nodeModifyTable.c, execPartition.c. In the process of making it
> look less random, we decided to get rid of the global variable
> es_result_relation_info to avoid complex maneuvers of
> setting/resetting it correctly when performing partition tuple
> routing, causing some other churn beside the partitioning code. Same
> with another global variable TransitionCaptureState.tcs_map. So, the
> patches neither add any new capabilities, nor improve performance, but
> they do make the code in this area a bit easier to follow.
>
> Actually, there is a problem that some of the changes here conflict
> with patches being discussed on other threads ([1], [2]), so much so
> that I decided to absorb some changes here into another "refactoring"
> patch that I have posted at [2].

Thanks for the summary. It's been a bit hard to follow what depends on
what across these threads, and how they work together. It seems that
this patch set is the best place to start.

> Attached rebased patches.
>
> 0001 contains preparatory FDW API changes to stop relying on
> es_result_relation_info being set correctly.

Makes sense. The only thing I don't like about this is the way the
ForeignScan->resultRelIndex field is set. make_foreignscan() initializes
it to -1, and the FDW's PlanDirectModify() function is expected to set
it, like you did in postgres_fdw:

> @@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
> rebuild_fdw_scan_tlist(fscan, returningList);
> }
>
> + /*
> + * Set the index of the subplan result rel.
> + */
> + fscan->resultRelIndex = subplan_index;
> +
> table_close(rel, NoLock);
> return true;
> }

It has to be set to that value (subplan_index is an argument to
PlanDirectModify()), the FDW doesn't have any choice there, so this is
just additional boilerplate code that has to be copied to every FDW that
implements direct modify. Furthermore, if the FDW doesn't set it
correctly, you could have some very interesting results, like updating
wrong table. It would be better to set it in make_modifytable(), just
after calling PlanDirectModify().

I'm also a bit unhappy with the way it's updated in set_plan_refs():

> --- a/src/backend/optimizer/plan/setrefs.c
> +++ b/src/backend/optimizer/plan/setrefs.c
> @@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
> rc->rti += rtoffset;
> rc->prti += rtoffset;
> }
> + /*
> + * Caution: Do not change the relative ordering of this loop
> + * and the statement below that adds the result relations to
> + * root->glob->resultRelations, because we need to use the
> + * current value of list_length(root->glob->resultRelations)
> + * in some plans.
> + */
> foreach(l, splan->plans)
> {
> lfirst(l) = set_plan_refs(root,
> @@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root,
> }
>
> fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
> +
> + /*
> + * Adjust resultRelIndex if it's valid (note that we are called before
> + * adding the RT indexes of ModifyTable result relations to the global
> + * list)
> + */
> + if (fscan->resultRelIndex >= 0)
> + fscan->resultRelIndex += list_length(root->glob->resultRelations);
> }
>
> /*

That "Caution" comment is well deserved, but could we make this more
robust to begin with? The most straightforward solution would be to pass
down the "current resultRelIndex" as an extra parameter to
set_plan_refs(), similar to rtoffset. If we did that, we wouldn't
actually need to set it before setrefs.c processing at all.

I'm a bit wary of adding another argument to set_plan_refs() because
that's a lot of code churn, but it does seem like the most natural
solution to me. Maybe create a new context struct to hold the
PlannerInfo, rtoffset, and the new "currentResultRelIndex" value,
similar to fix_scan_expr_context, to avoid passing through so many
arguments.

Another idea is to merge "resultRelIndex" and a "range table index" into
one value. Range table entries that are updated would have a
ResultRelInfo, others would not. I'm not sure if that would end up being
cleaner or messier than what we have now, but might be worth trying.

> 0002 removes es_result_relation_info in favor passing the active
> result relation around as a parameter in the various functions that
> need it

Looks good.

> 0003 Moves UPDATE tuple-routing logic into a new function
>
> 0004 removes the global variable TransitionCaptureState.tcs_map which
> needed to be set/reset whenever the active result relation relation
> changes in favor of a new field in ResultRelInfo to store the same map

I didn't look closely, but these make sense at a quick glance.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-10-05 16:03:55 Re: [HACKERS] Custom compression methods
Previous Message James Coleman 2020-10-05 15:33:10 Re: enable_incremental_sort changes query behavior