Re: Potential G2-item cycles under serializable isolation

From: Kyle Kingsbury <aphyr(at)jepsen(dot)io>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential G2-item cycles under serializable isolation
Date: 2020-06-04 20:34:58
Message-ID: b54952d0-8f10-2091-eb29-f825af3b5f37@jepsen.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 6/3/20 10:15 PM, Peter Geoghegan wrote:
> On Sun, May 31, 2020 at 7:25 PM Kyle Kingsbury <aphyr(at)jepsen(dot)io> wrote:
>> Which typically produces, after about a minute, anomalies like the following:
>>
>> G2-item #1
>> Let:
>> T1 = {:type :ok, :f :txn, :value [[:r 7 [1]] [:append 12 1]], :time 95024280,
>> :process 5, :index 50}
>> T2 = {:type :ok, :f :txn, :value [[:append 7 2] [:r 14 nil] [:append 14 1]
>> [:r 12 nil]], :time 98700211, :process 6, :index 70}
>>
>> Then:
>> - T1 < T2, because T1 did not observe T2's append of 2 to 7.
>> - However, T2 < T1, because T2 observed the initial (nil) state of 12, which
>> T1 created by appending 1: a contradiction!
>>
>> Is the format of these anomalies documented somewhere?
Unfortunately no. This is a plain-text representation emitted by Elle. You'll
also find a corresponding diagram of the cycle in `elle/g2-item/1.svg`, which
might be a bit easier to understand. The transactions themselves ({:type :ok
...}) are EDN (Clojure) data structures representing the completion operations
from Jepsen; you'll also see this format in history.edn.
>> How can I determine what SQL each transaction generates from these values? It's
>> not obvious to me which of the three tables (which of txn0, txn1, and txn2) are affected in each case.

This is a good and obvious question which I don't yet have a good answer for.
Reading the source gives you *some* idea of what SQL's being generated, but
there's some stuff being done by next.jdbc and JDBC itself, so I don't know how
to show you *exactly* what goes over the wire. A terrible way to do this is to
look at the pcap traces in wireshark--you can correlate from the timestamps in
jepsen.log, or search for the transactions which interacted with specific keys.

One option would be to add some sort of tracing thing to the test so that it
records the SQL statements it generates as extra metadata on operations. I can
look into doing that for you later on. :)

--Kyle

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-06-04 20:54:34 Re: time values past 24:00:00 (or rather 23:59:60)
Previous Message Tom Lane 2020-06-04 16:57:30 Re: posgres 12 bug (partitioned table)