Re: [HACKERS] MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-02-01 19:39:23
Message-ID: CAH2-WzmMpW97m_uAiVfTF1swA06KREbEkKftX6WBcYoAVi867g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Thu, Feb 1, 2018 at 4:45 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think it would be very helpful if we could discuss everything with
> direct relevance to v14, so this becomes a patch review, not just a
> debate.

I wish I could give you a clear answer on the way forward with total
confidence, or even moderate confidence, but right now I can't.
Hopefully I'll be able to convince myself that I've understood all the
nuances of this EPQ + WHEN ... AND qual business shortly, perhaps in
the next couple of days, at which point I'll have a position that I'm
willing to defend. I don't even have that much right now.

On a more positive note, I do agree with you that this only affects a
small subset of real-world queries (even if we assume READ COMMITTED
conflicts are common). To put it another way, I would be able to give
you a simple opinion right now if WHEN ... AND quals were prohibited.
Not that I'm proposing actually prohibiting them. (BTW, don't you
think it's interesting that Oracle doesn't allow them, but instead
requires WHERE clauses, even on an INSERT?)

Leaving concurrency aside for a moment, I want to talk about bugs. I
read through this:

https://wiki.postgresql.org/wiki/SQL_MERGE_Patch_Status#Bugs

This suggestion you make, which is that the only problem with wholerow
vars is that there is a weird error message (and they're actually
unsupported) is not helping your cause. While I think that users will
expect that to work, that isn't really the main point. I tried using
wholerow vars as a smoke test for setrefs.c handling, as well as the
handling of targetlists within the rewriter. The fact that wholerow
vars don't work is likely indicative of your general approach in these
areas needing to be thought through in more detail. That's the really
important thing. Wholerow vars shouldn't be a special case to the
underlying implementation, and if you found a way to make them work
that was a special case that would seem questionable to me for similar
reasons. Sometimes wholerow vars actually *do* work with your patch --
the problem only happens with data_source whole-row vars, and never
target_table_name wholerow vars:

postgres=# merge into testoids a using (select i "key", 'foo' "data"
from generate_series(0,3) i) b on a.key = b.key when matched then
update set data = a.*::text when not matched then insert (key, data)
values (b.key, 'foo');
MERGE 4
postgres=# merge into testoids a using (select i "key", 'foo' "data"
from generate_series(0,3) i) b on a.key = b.key when matched then
update set data = b.*::text when not matched then insert (key, data)
values (b.key, 'foo');
ERROR: variable not found in subplan target lists

There is also the matter of subselects in the update targetlist, which
you similarly claim is only a problem of having the wrong error
message. The idea that those are unsupported for any principled reason
doesn't have any justification (unlike WHEN ... AND quals, and their
restrictions, which I agree are necessary). It clearly works with
Oracle, for example:

http://sqlfiddle.com/#!4/2d5405/10

You're reusing set_clause_list in the grammar, so I don't see why it
shouldn't work within MERGE in just the same way as it works in
UPDATE. While I think that there is a legitimate need for restrictions
on some merge_when_clause cases, such as the VALUES() of merge_insert,
this isn't an example of that. Again, this suggests to me a need for
more work within the optimizer.

Finally, I noticed a problem with your new EXPLAIN ANALYZE instrumentation:

Is it 4 rows inserted, or 0 inserted?

postgres=# merge into testoids a using (select i "key", 'foo' "data"
from generate_series(0,3) i) b on a.key = b.key when matched and 1=0
then update set data = b.data when not matched then insert (key, data)
values (b.key, 'foo');
MERGE 0
postgres=# explain analyze merge into testoids a using (select i
"key", 'foo' "data" from generate_series(0,3) i) b on a.key = b.key
when matched and 1=0 then update set data = b.data when not matched
then insert (key, data) values (b.key, 'foo');
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Merge on testoids a (cost=38.58..61.19 rows=1000 width=42) (actual
time=0.043..0.049 rows=4 loops=1)
Tuples Inserted: 4
Tuples Updated: 0
Tuples Deleted: 0
-> Hash Left Join (cost=38.58..61.19 rows=1000 width=42) (actual
time=0.039..0.043 rows=4 loops=1)
Hash Cond: (i.i = a.key)
-> Function Scan on generate_series i (cost=0.00..10.00
rows=1000 width=4) (actual time=0.009..0.010 rows=4 loops=1)
-> Hash (cost=22.70..22.70 rows=1270 width=10) (actual
time=0.021..0.021 rows=4 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 17kB
-> Seq Scan on testoids a (cost=0.00..22.70 rows=1270
width=10) (actual time=0.014..0.016 rows=4 loops=1)
Planning time: 0.202 ms
Execution time: 0.109 ms
(12 rows)

(It should be 0 rows inserted, not 4.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-01 20:11:35 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Robert Haas 2018-02-01 19:03:23 Re: [HACKERS] Refactoring identifier checks to consistently use strcmp