Re: Support UPDATE table SET(*)=...

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support UPDATE table SET(*)=...
Date: 2015-04-08 09:44:26
Message-ID: 87h9srouxu.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> I spent a fair amount of time cleaning this patch up to get it
Tom> into committable shape, but as I was working on the documentation
Tom> I started to lose enthusiasm for it, because I was having a hard
Tom> time coming up with compelling examples. The originally proposed
Tom> motivation was

>>> It solves the problem of doing UPDATE from a record variable of the same
>>> type as the table e.g. update foo set (*) = (select foorec.*) where ...;

There are a number of motivating examples for this (which have nothing
to do with rules; I doubt anyone cares much about that).

The fundamental point is that currently, given a table "foo" and some
column or variable of foo's rowtype, you can do this:

insert into foo select foorec.* [from ...]

but there is no comparable way to do an update without naming every
column explicitly, the closest being

update foo set (a,b,...) = (foorec.a, foorec.b, ...) where ...

One example that comes up occasionally (and that I've had to do myself
more than once) is this: given a table "foo" and another with identical
schema "reference_foo", apply appropriate inserts, updates and deletes
to table "foo" to make the content of the two tables identical. This can
be done these days with wCTEs:

with
t_diff as (select o.id as o_id, n.id as n_id, o, n
from foo o full outer join reference_foo n on (o.id=n.id)
where (o.*) is distinct from (n.*)),
ins as (insert into foo select (n).* from t_diff where o_id is null),
del as (delete from foo
where id in (select o_id from t_diff where n_id is null)),
upd as (update foo
set (col1,col2,...) = ((n).col1,(n).col2,...) -- XXX
from t_diff
where foo.id = n_id and o_id = n_id)
select count(*) filter (where o_id is null) as num_ins,
count(*) filter (where o_id = n_id) as num_upd,
count(*) filter (where n_id is null) as num_del
from t_diff;

(This would be preferred over simply replacing the table content if the
table is large and the changes few, you want to audit the changes, you
need to avoid interfering with concurrent selects, etc.)

The update part of that would be much improved by simply being able to
say "update all columns of foo with values from (n)". The exact syntax
isn't a big deal - though since SET (cols...) = ... is in the spec, it
seems reasonable to at least keep some kind of consistency with it.

Other examples arise from things one might want to do in plpgsql; for
example to update a record from an hstore or json value, one can use
[json_]populate_record to construct a record variable, but then it's
back to naming all the columns in order to actually perform the update
statement.

[My connection with this patch is only that I suggested it to Atri as a
possible project for him to do, because I wanted the feature and knew
others did also, and helped explain how the existing MultiAssign worked
and some of the criticism. I did not contribute any code.]

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-04-08 10:00:36 Re: Parallel Seq Scan
Previous Message Jan Urbański 2015-04-08 09:26:19 Re: libpq's multi-threaded SSL callback handling is busted