Re: Potential G2-item cycles under serializable isolation

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Kyle Kingsbury <aphyr(at)jepsen(dot)io>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential G2-item cycles under serializable isolation
Date: 2020-06-13 18:37:08
Message-ID: CAH2-Wz=WK-yvYhhJnu+3fgraw_zd23275pAKKk0PCdEQvU0U=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jun 12, 2020 at 9:49 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Slightly improved version, bringing some wording into line with
> existing documentation. s/SQL Standard/SQL standard/, and explicitly
> referring to "locking" implementations of RR and Ser (as we do already
> a few paragraphs earlier, when discussing MVCC). My intention is to
> push this to all branches in a couple of days if there is no other
> feedback. I propose to treat it as a defect, because I agree that
> it's weird and surprising that we don't mention SI, especially
> considering the history of the standard levels.

That seems reasonable to me.

The doc patch says "some other systems may even offer Repeatable Read
and Snapshot Isolation as distinct isolation levels". This is true of
SQL Server, which has an explicit snapshot isolation level, though
possibly only because SI was retrofitted to SQL Server 2005 -- I guess
that that's what you have in mind. Perhaps you should reword this a
little to convey that you mean a separate isolation mode *and* an
actually-distinct set of behaviors to go with it.

> I mean, I guess it's
> basically implied by all the stuff that section says about MVCC vs
> traditional locking systems, and it's a super well known fact in our
> hacker community, but not using the standard term of art is a strange
> omission.

Yes, it's definitely an omission.

> In future release perhaps we could entertain ideas like accepting the
> name SNAPSHOT ISOLATION, and writing some more use-friendly guidance,
> and possibly even reference the Generalized Isolation Level
> Definitions stuff. I think it'd be a bad idea to stop accepting
> REPEATABLE READ and inconvenience our users, though; IMHO it's
> perfectly OK to stick with the current interpretation of the spec
> while also acknowledging flaws and newer thinking through this new
> paragraph.

Kyle made the following observation about "true" RR at one point: "If
you use the broad/generalized interpretation, you can trust that a
program which only accesses data by primary key, running under
repeatable read, is actually serializable". However, the simple fact
is that it just wouldn't be sensible for Postgres to offer an
according-to-Adya REPEATABLE READ mode that isn't just an alias for
SERIALIZABLE/SSI. What does the leeway that RR gives us compared to
SERIALIZABLE actually buy the implementation, performance-wise? This
is similar to the situation with READ UNCOMMITED and READ COMMITTED in
Postgres.

Even though I am sympathetic to Kyle's concerns here, and am all in
favor of this documentation patch, I believe that the current naming
scheme is the least-worst option. The point of using the standard
names is to make it easy to understand the guarantees on offer. But we
need to call the current REPEATABLE READ behavior *something*, and the
closest thing that the standard has is RR. Unless we're willing to
make no distinction between SSI and snapshot isolation -- which we're
clearly not. Or unless we're willing to exclusively use some
non-standard name instead -- which is also clearly a non-starter
(though I'd be fine with an alias).

Speaking of comparing behaviors across systems, the behavior that
MySQL calls REPEATABLE READ mode is actually READ COMMITTED, while the
behavior that DB2 calls REPEATABLE READ is actually SERIALIZABLE. The
range of guarantees that you actually get are *enormous* in reality!
ISTM that REPEATABLE READ is by far the most confusing and ambiguous
isolation level that the standard describes.

Even Elle can't tell the difference between "true" REPEATABLE READ
mode and SERIALIZABLE mode; why should Postgres users be able to?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2020-06-14 10:51:58 Re: BUG #16492: DROP VIEW IF EXISTS error
Previous Message Nina Marlow 2020-06-13 11:06:33 Re: BUG #16492: DROP VIEW IF EXISTS error