Re: [HACKERS] SERIALIZABLE on standby servers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] SERIALIZABLE on standby servers
Date: 2020-06-15 07:36:41
Message-ID: CA+hUKGLRUwKNs0w+YzfsNF62O0L_gfbTFOJND1yMTorg95rJOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 15, 2020 at 5:00 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Fri, Dec 28, 2018 at 02:21:44PM +1300, Thomas Munro wrote:
> > Just to be clear, although this patch is registered in the commitfest
> > and currently applies and tests pass, it is prototype/WIP code with
> > significant problems that remain to be resolved. I sort of wish there
> > were a way to indicate that in the CF but since there isn't, I'm
> > saying that here. What I hope to get from Kevin, Simon or other
> > reviewers is some feedback on the general approach and problems
> > discussed upthread (and other problems and better ideas I might have
> > missed). So it's not seriously proposed for commit in this CF.
>
> No feedback has actually come, so I have moved it to next CF.

Having been nerd-sniped by SSI again, I spent some time this weekend
rebasing this old patch, making a few improvements, and reformulating
the problems to be solved as I see them. It's very roughly based on
Kevin Grittner and Dan Ports' description of how you could give
SERIALIZABLE a useful meaning on hot standbys. The short version of
the theory is that you can make it work like SERIALIZABLE READ ONLY
DEFERRABLE by adding a bit of extra information into the WAL stream.

Problems:

1. As a prerequisite, we'd need to teach primary servers to make
transactions visible in the same order that they log commits.
Otherwise, we permit nonsense like seeing TX1 but not TX2 on the
primary, and TX2 but not TX1 on the replica. You can probably argue
that our read replicas don't satisfy the lower isolation levels, let
alone serializable.

2. Similarly, it's probably not OK that
PreCommit_CheckForSerializationFailure() determines
MySerializableXact->snapshotSafetyAfterThisCommit. That may not
happen in exactly the same order as commits are logged. Or maybe
there is some argument for why that is OK, based on what we're doing
with prepareSeqNo, or maybe we can do something with that to detect
disorder.

3. The patch doesn't yet attempt to checkpoint the snapshot safety
state. That's needed to start up in a sane state, without having to
wait for WAL activity.

4. XactLogSnapshotSafetyRecord() flushes the WAL an extra time after
a commit is flushed, which I put in for testing; that's silly...
somehow it needs to be better integrated so we don't generate two sync
I/Os in a row.

5. You probably want a way to turn off the extra WAL records and
SERIALIZABLEXACT consumption if you're using SERIALIZABLE on a primary
but not on the standby. Or maybe there is some way to make it come on
automatically.

I think I have cleared up the matter of xmin tracking for
"hypothetical" SERIALIZABLEXACTs mentioned earlier. It's not needed,
so should be set to InvalidTransactionId, and I added a comment to
explain why.

I also wrote a TAP test to exercise this thing. It is the same
schedule as src/test/isolation/specs/read-only-anomaly-3.spec, except
that transaction 3 runs on a streaming replica.

One thing to point out is that this patch only aims to make it so that
streaming replicas can't observe a state that would have caused a
transaction to abort if it had been observed on the primary. The TAP
test still has to insert its own wait-for-LSN loop to make sure step
"s1c" is replayed before "s3r" runs. We could use
synchronous_commit=remote_apply, and that'd probably work just as well
for this particular test, but I'm not sure how to square that with
fixing problem #1 above.

The perl hackery I used to do overlapping transactions in a TAP test
is pretty crufty. I guess we'd ideally have the isolation tester
support per-session connection strings, and somehow get some perl code
to orchestrate the cluster setup but then run the real isolation
tester. Or something like that.

Attachment Content-Type Size
v5-0001-SERIALIZABLE-READ-ONLY-DEFERRABLE-on-streaming-re.patch text/x-patch 38.0 KB
v5-0002-Add-perl-module-for-isolation-like-testing.patch text/x-patch 3.2 KB
v5-0003-Add-test-for-SERIALIZABLE-on-streaming-replicas.patch text/x-patch 3.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-06-15 07:47:01 Re: doc examples for pghandler
Previous Message Kyotaro Horiguchi 2020-06-15 07:35:22 Re: min_safe_lsn column in pg_replication_slots view