WIP patch for multiple column assignment in UPDATE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP patch for multiple column assignment in UPDATE
Date: 2014-06-16 15:17:21
Message-ID: 20930.1402931841@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a very-much-WIP patch for supporting
UPDATE foo SET ..., (a,b,c) = (select x,y,z from ...), ...

It lacks documentation, ruleutils.c support, or any solution for the
rule NEW.* expansion problem I mentioned Saturday. The reason I'm
posting it now is to get feedback about an implementation choice that
feels a bit klugy to me; but I don't see any clearly better idea.

The post-parse-analysis representation I've chosen here is that the
output columns of the sub-select are represented by PARAM_MULTIEXEC
Params, and the sub-select itself appears in a resjunk entry at the end
of the targetlist; that is, the UPDATE tlist is more or less like

$1, -- to be assigned to a
$2, -- to be assigned to b
$3, -- to be assigned to c
(select x,y,z from ...), -- resjunk entry, value will be discarded

If the sub-select is uncorrelated with the outer query, the planner
turns it into an InitPlan, replacing the resjunk tlist item with a
NULL constant, and then everything happens normally at execution.

But more usually, the sub-select will be correlated with the outer
query. In this case, the subquery turns into a SubPlan tree that
stays in the resjunk item. At the start of execution, the ParamExecData
structs for each of its output Params are marked with execPlan pointers
to the subplan, just as would happen for an InitPlan. This causes the
subplan to get executed when the first of the output Params is evaluated;
it loads the ParamExecData structs for all its output Params, and then
the later Params just take data from the structs. When execution reaches
the MULTIEXEC SubPlan in the resjunk tlist item, no evaluation of the
subplan is needed; but instead we re-mark all the output ParamExecData
structs with non-null execPlan pointers, so that a fresh execution of
the subplan will happen in the next evaluation of the targetlist.

The klugy aspect of this is that it assumes that the SubPlan item will
appear later in the tlist than all the Params referencing it. This is
true at the moment because resjunk tlist items always appear after
non-resjunk ones. There are a few other places that already depend on
this ordering, but we've mostly tried to avoid introducing new
dependencies on it.

The alternative that I'd originally had in mind, before put-it-in-a-
resjunk-item occurred to me, was to invent a new "secondary tlist"
field of Query and of ModifyTable plan nodes, as I sketched back in
http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us
We'd put the MULTIEXEC SubPlans in this secondary tlist and expect
the executor to evaluate it just before evaluating the main tlist.
However, that way isn't terribly pretty either, because it extends
knowledge of this feature to a *whole lot* of places that don't have
to know about it in the attached approach; in particular, just about
every place that manipulates targetlist contents would have to also
manipulate the secondary tlist.

Another approach is to explicitly identify which of the Params will
be evaluated first and replace it with a node tree that evaluates
the subplan (and sets output Params for the remaining columns).
This is a bit messy because the first-to-be-evaluated is dependent
on the targetlist reordering that the planner does; so we don't want
parse analysis to try to do it. (If we allowed parse analysis to
know that the planner will sort the tlist items into physical column
order, we could do it like that; but then it would break if we ever
get around to allowing ALTER TABLE to change the physical order.)
We could safely have setrefs.c determine the first-to-be-evaluated
Param, though, since it will traverse the tlist in final order.
So if we went with this approach I'd have setrefs.c replace the first
Param with a SubPlan node. That seems a bit of a kluge too though.

Preferences, comments, better ideas?

regards, tom lane

Attachment Content-Type Size
multi-assignment-1.patch text/x-diff 59.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-06-16 15:28:35 Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Previous Message Abhijit Menon-Sen 2014-06-16 14:59:00 [REVIEW] Re: postgresql.auto.conf read from wrong directory