Re: Supporting multiple column assignment in UPDATE (9.5 project)

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Supporting multiple column assignment in UPDATE (9.5 project)
Date: 2014-05-02 19:34:23
Message-ID: CAHyXU0zz9uORDsbyUCwiuD3N+XS7J3ya9m2g-3_jgU6eceyHHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 2, 2014 at 1:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been thinking about how we might implement the <multiple column
> assignment> UPDATE syntax that was introduced in SQL:2003. This feature
> allows you to do
>
> UPDATE table SET ..., (column, column, ...) = <row-valued expression>, ...
>
> where the system arranges to evaluate the <row-valued expression> just
> once per row and then assign its fields into the specified target columns.
>
> Back in commit 6e8596a146c9b16f2c053ea8c6e361a114c0b65c we introduced
> some limited support for this syntax, but it only handles a <row-valued
> expression> that is a ROW() constructor, and it just does a simple
> syntactic transformation of pulling apart the ROW() constructor and
> building an independent assignment to each target column. The actually
> interesting uses for this feature don't work with that implementation
> approach. The most common case I've seen asked for is where the
> expression is a sub-SELECT returning multiple columns (but at most one
> row).
>
> As far as the parser is concerned, the main hurdle to supporting this
> feature is that the representation of an UPDATE's targetlist assumes that
> each list element is an independent TargetEntry representing a single
> assignment. Now, there is a heck of a lot of code that knows what
> targetlists look like, so I'm not eager to try to change that basic
> assumption. What seems like probably a better idea is to represent
> SET (target1, target2, target3) = foo
> as though it were
> SET target1 = fooref.col1, target2 = fooref.col2, target3 = fooref.col3
> where fooref is some Param-like reference to a separate list of
> expressions that have composite outputs. It would be understood that this
> separate targetlist would be evaluated just once before evaluating the
> main tlist. This approach would allow all the existing targetlist
> manipulation code to stay about the same. It would be a bit of a
> challenge for ruleutils.c to reconstruct the original syntax when printing
> an UPDATE in a rule, but I think that's just a small matter of
> programming. (Possibly it would help if the elements of the separate
> composite-values targetlist contained markers as to which main-tlist
> elements they were for.)
>
> Now, we could probably implement it straightforwardly just based on that
> idea, though it's not quite clear where to shoehorn evaluation of the
> separate targetlist into the constructed plan. One way would be to
> insert an additional evaluation level by adding a Result node on top
> of the normal plan, and then have the lower level compute the composite
> values as resjunk tlist elements, while the upper level does FieldSelects
> from the composite values to implement the "fooref.colN" references.
>
> However, I'm mainly interested in the sub-SELECT case; indeed, anything
> else you might want to do could be transformed into a sub-SELECT, so
> I wouldn't feel bad if we just restricted the new feature to that.
> And this doesn't seem like quite the right way to do it for sub-SELECTs.
>
> In the case of sub-SELECTs, we have almost the right execution mechanism
> already, in that initPlans are capable of setting multiple PARAM_EXEC
> runtime Params, one for each output column of the sub-SELECT. So what
> I called "fooref.col1" etc above could just be PARAM_EXEC Params referring
> to the subplan outputs --- except that initPlans are only for uncorrelated
> subqueries (those without any outer references to Vars of the parent query
> level). And the interesting cases for UPDATE generally involve correlated
> subqueries.
>
> What I'm thinking about this is that we ought to make an effort to unify
> the currently separate implementation paths for correlated and
> uncorrelated subqueries. Instead of SubPlans in the expression tree for
> correlated subqueries, I think they should all be treated much like
> initPlans are now, ie, there are PARAM_EXEC Params referencing outputs
> from a list of subqueries that are attached to the expression tree's
> parent plan node, and we lazily evaluate the subqueries upon first use of
> one of their output parameters. What would be different from the current
> handling of initPlans is that each time we advance to a new input row,
> we'd need to reset the evaluation state of the subqueries that are
> correlated. The reason for changing it like that is so that we can have
> multiple separate Params referencing different output columns of a single
> correlated subquery, and be sure that we evaluate the correlated subquery
> only once; the current SubPlan mechanism can't support separate references
> to the same subplan. Now, this would add a small amount of new bookkeeping
> overhead to use of correlated subqueries, but I find it hard to believe
> that that'd be noticeable compared to the startup/shutdown cost of the
> subquery.
>
> So, if we were to revise the handling of correlated subqueries like that,
> then for the case of a <row-valued expression> that is a sub-SELECT we
> wouldn't need any explicit runtime evaluation of "separate targetlist"
> entries. Use of Params referencing the subplan's outputs would be enough
> to cause evaluation to happen at the right times. At least for the first
> cut, I think it'd be enough to stop there, though maybe later we could
> deal with <row-valued expression>s that aren't sub-SELECTs by adding more
> planner or executor support.
>
> Anyway, I'm considering working on this for 9.5. Comments?

Couple quick questions:
1) how does this interplay with RETURNING? I guess it probably
doesn't change, but I imagine there's be no way to reference the
composite result in the RETURNING statement?

2) I often wish that you could reference the table (or it's alias)
directly as the field list.

UPDATE foo f set f = (...)::foo;
or even
UPDATE foo SET foo = foo;

I guess this probably isn't in the standard, but it would help doing
thing like writing trigger functions that upsert data from another
database with matching table definition (you can insert via
INSERT..SELECT (f::foo).* but not UPDATE without doing a full field
listing.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-02 19:47:29 Re: Supporting multiple column assignment in UPDATE (9.5 project)
Previous Message Tom Lane 2014-05-02 19:31:30 Re: [COMMITTERS] pgsql: Fix "quiet inline" configure test for newer clang compilers.