Re: MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MERGE SQL Statement for PG11
Date: 2017-11-06 21:50:03
Message-ID: 20171106215003.GA30357@marmot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>In step 3 we discover that an entry exists in the index for a committed row.
>
>Since we have a unique index we use it to locate the row we know
>exists and UPDATE that.
>
>We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
>violating MVCC for UPDATEs, so why does it matter if we do it for
>INSERTs also?

Before I go on to say why I think that this approach is problematic, I
want to point out a few things that I think we actually agree on:

* EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE
conflict handling. It has more to do with how VACUUM works than about
some platonic ideal that everyone agrees on.

* We can imagine other alternatives, such as the behavior in Oracle
(statement level rollback + optimistic retry).

* Those alternatives are probably better in some ways but worse in other
ways.

* EPQ violates snapshot consistency, even though that's not inherently
necessary to avoid "READ COMMITTED serialization errors".

* ON CONFLICT also violates snapshot consistency, in rather a different
way. (Whether or not this is necessary is more debatable.)

I actually think that other MVCC systems don't actually copy Oracle here,
either, and for similar pragmatic reasons. It's a mixed bag.

>Where hides the problem?

The problem is violating MVCC is something that can be done in different
ways, and by meaningful degrees:

* EPQ semantics are believed to be fine because we don't get complaints
about it. I think that that's because it's specialized to UPDATEs and
UPDATE-like operations, where we walk an UPDATE chain specifically,
and only use a dirty snapshot for the chain's newer tuples.

* ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no
distinction between a concurrent UPDATE, and a concurrent DELETE + fresh
INSERT. It's specialized to CONFLICTs.

This might seem abstract, but it has real, practical implications.
Certain contradictions exist when you start with MVCC semantics, then
fall back to EPQ semantics, then finally fall back to ON CONFLICT
semantics.

Questions about mixing these two things:

* What do we do if someone concurrently UPDATEs in a way that makes the
qual not pass during EPQ traversal? Should we INSERT when that
happens?

* If so, what about the case when the MERGE join qual/unique index
values didn't change (just some other attributes that do not pass the
additional WHEN MATCHED qual)?

* What about when there was a concurrent DELETE -- should we INSERT then?

ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's
hugely different to doing it the other way around: starting from your
own MVCC snapshot qual, and going to a CONFLICT. This is because
evaluating the DO UPDATE's WHERE clause is just one little extra step
after the one and only latest row for that value has been locked. You
could theoretically go this way with 2PL, I think, because that's a bit
like locking every row that the predicate touches, but of course that
isn't at all practical.

I should stop trying to make a watertight case against this, even though
I still think that's possible. For now, instead, I'll just say that this
is *extremely* complicated, and still has unresolved questions about
semantics.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Юрий Соколов 2017-11-06 21:58:35 Re: Small improvement to compactify_tuples
Previous Message Simon Riggs 2017-11-06 21:27:54 Re: SQL procedures