Re: Potential G2-item cycles under serializable isolation

From: Kyle Kingsbury <aphyr(at)jepsen(dot)io>
To: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential G2-item cycles under serializable isolation
Date: 2020-06-02 16:18:52
Message-ID: 3150072e-13b8-7655-211b-17898f894e45@jepsen.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

OK! So I've designed a variant of this test which doesn't use ON CONFLICT.
Instead, we do a homebrew sort of upsert: we try to update the row in place by
primary key; if we see zero records updated, we insert a new row, and if *that*
fails due to the primary key conflict, we try the update again, under the theory
that since we now know a copy of the row exists, we should be able to update it.

https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L108

This isn't bulletproof; even under SERIALIZABLE, Postgres will allow a
transaction to fail to update any rows, then fail to insert due to a primary key
conflict--but because the primary key conflict forces a transaction abort (or
rolling back to a savepoint, which negates the conflict), I think it's still
serializable. The isolation docs explain this, I think.

Unfortunately, I'm still seeing tons of G2-item cycles. Whatever this is, it's
not related to ON CONFLICT.

As a side note, I can confirm that Postgres repeatable read is definitely weaker
than repeatable read--at least as formalized by Adya et al. This makes sense if
you understand repeatable read to mean SI (after all, the SI paper says they're
incomparable!), but the Postgres docs seem to imply Postgres is strictly
*stronger* than the ANSI SQL spec, and I'm not convinced that's entirely true:

https://www.postgresql.org/docs/12/transaction-iso.html

> The table also shows that PostgreSQL's Repeatable Read implementation does
not allow phantom reads. Stricter behavior is permitted by the SQL standard...

> This is a stronger guarantee than is required by the SQL standard for this
isolation level...

I get the sense that the Postgres docs have already diverged from the ANSI SQL
standard a bit, since SQL 92 only defines three anomalies (P1, P2, P3), and
Postgres defines a fourth: "serialization anomaly".

This results in a sort of weird situation: on the one hand, it's true: we don't
(I think) observe P1 or P2 under Postgres Repeatable Read. On the other hand,
SQL 92 says that the difference between repeatable read and serializable is
*exactly* the prohibition of P3 ("phantom"). Even though all our operations are
performed by primary key, we can observe a distinct difference between Postgres
repeatable read and Postgres serializable.

I can see two ways to reconcile this--one being that Postgres chose the anomaly
interpretation of the SQL spec, and the result is... maybe internally
inconsistent? Or perhaps one of the operations in this workload actually *is* a
predicate operation--maybe by dint of relying on a uniqueness constraint?

I'm surprised the transaction isolation docs don't say something like "Postgres
repeatable read level means snapshot isolation, which is incomparable with SQL
repeatable read." Then it'd be obvious that Postgres repeatable read exhibits
G2-item! Weirdly, the term "snapshot isolation" doesn't show up on the page at all!

--Kyle

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2020-06-02 16:50:47 Re: Potential G2-item cycles under serializable isolation
Previous Message Fabien COELHO 2020-06-02 11:27:48 Re: pgbench bug / limitation