Re: [HACKERS] MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-02-16 01:07:08
Message-ID: CAH2-WzmakCf-HnQrewaEgToTokxHZnawDoOGtuOMowsNzQ2USg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 11, 2018 at 11:09 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> On Fri, Feb 9, 2018 at 6:53 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> My concern is mostly just that MERGE manages to behave in a way that
>> actually "provides a single SQL statement that can conditionally
>> INSERT, UPDATE or DELETE rows, a task that would otherwise require
>> multiple procedural language statements", as the docs put it. As long
>> as MERGE manages to do something as close to that high level
>> description as possible in READ COMMITTED mode (with our current
>> semantics for multiple statements in RC taken as the baseline), then
>> I'll probably be happy.
>
>
> IMO it will be quite hard, if not impossible, to guarantee the same
> semantics to a single statement MERGE and multi statement
> UPDATE/DELETE/INSERT in RC mode. For example, the multi statement model will
> execute each statement with a new MVCC snapshot and hence the rows visible
> to individual statement may vary. Whereas in MERGE, everything runs with a
> single snapshot. There could be other such subtle differences.

I didn't mean this literally. For simple cases, an EPQ walk of the
update chain is kind of like acquiring a new snapshot.

ISTM that a MERGE isn't really a thing that replaces 2 or 3 other DML
statements, at least in most cases. It's more like a replacement for
procedural code with an outer join, with an INSERT, UPDATE or DELETE
that affects zero or one rows inside the procedural loop that
processes matching/non-matching rows. The equivalent procedural code
could ultimately perform *thousands* of snapshot acquisitions for
thousands of RC DML statements. MERGE is sometimes explained in terms
of "here is the kind of procedural code that you don't have to write
anymore, thanks to MERGE" -- that's what the code looks like.

I attach a rough example of this, that uses plpgsql.

>> Some novel new behavior -- "EPQ with a twist"-- is clearly necessary.
>> I feel a bit uneasy about it because anything that anybody suggests is
>> likely to be at least a bit arbitrary (EPQ itself is kind of
>> arbitrary). We only get to make a decision on how "EPQ with a twist"
>> will work once, and that should be a decision that is made following
>> careful deliberation. Ambiguity is much more likely to kill a patch
>> than a specific technical defect, at least in my experience. Somebody
>> can usually just fix a technical defect.
>
>
> While I agree, I think we need to make these decisions in a time bound
> fashion. If there is too much ambiguity, then it's not a bad idea to settle
> for throwing appropriate errors instead of providing semantically wrong
> answers, even in some remote corner case.

Everything is still on the table, I think.

> Ok. I am now back from holidays and I will too start thinking about this.
> I've also requested a colleague to help us with comparing it against
> Oracle's behaviour. N That's not a gold standard for us, but knowing how
> other major databases handle RC conflicts, is not a bad idea.

The fact that Oracle doesn't allow WHEN MATCHED ... AND quals did seem
like it might be significant to me.

I think that any theoretical justification for one behavior over
another will be hard for anyone to come up with. As I said before,
this is not an area where something like the SQL standard provides us
with a platonic ideal. The best behavior is likely to be one that
lives up to the high level description of MERGE, is as close as
possible to existing behaviors, and is not otherwise surprising.

> I see the following important areas and as long as we have a consistent and
> coherent handling of these cases, we should not have difficulty agreeing on
> a outcome.
>
> 1. Concurrent UPDATE does not affect MATCHED case. The WHEN conditions may
> or may not be affected.
> 2. Concurrently UPDATEd tuple fails the join qual and the current source
> tuple no longer matches with the updated target tuple that the EPQ is set
> for. It matches no other target tuple either. So a MATCHED case is turned
> into a NOT MATCHED case.
> 3. Concurrently UPDATEd tuple fails the join qual and the current source
> tuple no longer matches with the updated target tuple that the EPQ is set
> for. But it matches some other target tuple. So it's still a MATCHED case,
> but with different target tuple(s).
> 4. Concurrent UPDATE/INSERT creates a matching target tuple for a source
> tuple, thus turning a NOT MATCHED case to a MATCHED case.
> 5. Concurrent DELETE turns a MATCHED case into NOT MATCHED case
>
> Any other case that I am missing? Assuming all cases are covered, what
> should we do in each of these cases, so that there is no or very little
> ambiguity and the outcome seems consistent (at the very least as far as
> MERGE goes and hopefully with regular UPDATE/DELETE handling)

Uhhh...I still need to spend more time on this. Sorry.

> It seems natural that #5 should skip the MATCHED action and instead execute
> the first satisfying NOT MATCHED action. But it's outcome may depend on how
> we handle #2 and #3 so that they are all consistent. If we allow #2 and #3
> to error out, whenever there is ambiguity, we should do the same for #5.

I definitely agree on behaving consistently in the way you describe here.

--
Peter Geoghegan

Attachment Content-Type Size
merge.sql application/sql 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-02-16 01:49:12 Re: non-bulk inserts and tuple routing
Previous Message Andreas Karlsson 2018-02-15 23:42:00 Re: JIT compiling with LLVM v9.1