Re: [HACKERS] SERIALIZABLE on standby servers

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] SERIALIZABLE on standby servers
Date: 2017-11-16 22:52:16
Message-ID: CANP8+jL1iVVVOnvLkUA7wq1p50ti+9==hRVX2+1iPbris-ZRnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 January 2017 at 16:16, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Nov 16, 2016 at 9:26 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> [..] Another solution
>> could be to have recovery on the standby detect tokens (CSNs
>> incremented by PreCommit_CheckForSerializationFailure) arriving out of
>> order, but I don't know what exactly it should do about that when it
>> is detected: you shouldn't respect an out-of-order claim of safety,
>> but then what should you wait for? Perhaps if the last replayed
>> commit record before that was marked SNAPSHOT_SAFE then it's OK to
>> leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
>> you have to wait for that one to be resolved by a follow-up snapshot
>> safety message and then rince-and-repeat (take a new snapshot etc). I
>> think that might work, but it seems strange to allow random races on
>> the primary to create extra delays on the standby. Perhaps there is
>> some much simpler way to do all this that I'm missing.
>>
>> Another detail is that standbys that start up from a checkpoint and
>> don't see any SSI transactions commit don't yet have any snapshot
>> safety information, but defaulting to assuming that this point is safe
>> doesn't seem right, so I suspect it needs to be in checkpoints.
>>
>> Attached is a tidied up version which doesn't try to address the above
>> problems yet. When time permits I'll come back to this.
>
> I haven't looked at this again yet but a nearby thread reminded me of
> another problem with this which I wanted to restate explicitly here in
> the context of this patch. Even without replication in the picture,
> there is a race to reach ProcArrayEndTransaction() after
> RecordTransactionCommit() runs, which means that the DO history
> (normal primary server) and REDO history (recovery) don't always agree
> on the order that transactions become visible. With this patch, this
> kind of diverging DO and REDO could allow undetectable read only
> serialization anomalies. I think that ProcArrayEndTransaction() and
> RecordTransactionCommit() need to be made atomic in the simple case so
> that DO and REDO agree.

Not atomic, we just need to make ProcArrayEndTransaction() apply
changes in the order of commits.

I think that is more easily possible by reusing the
SyncRepWaitForLSN() code, since that already orders things by LSN.

So make all committers wait and then get WALwriter to wake people
after ProcArrayEndTransaction() has been applied.

> Synchronous replication can make that more
> likely and it seems like some other approach is probably needed to
> delay visibility of not-yet-durable transactions while keeping the
> order that transactions become visible the same on all nodes.
> Aside from the problems I mentioned in my earlier message (race
> between snapshot safety decision and logging order, and lack of
> checkpointing of snapshot safety information), it seems like the two
> DO vs REDO problems (race to ProcArrayEndTransaction, and deliberately
> delayed visibility in syncrep) also need to be addressed before
> SERIALIZABLE DEFERRABLE on standbys could make a water tight
> guarantee.

While the difference in ordering is there, it would be useful to show
how that allows serializable anomalies iff the WAL ordering is already
known serializable.

Mixing robustness modes with access to the same data is avoidable by
design, but we could have a parameter(s) to prevent that if desirable,
but only to prevent documented problems.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brian Cloutier 2017-11-16 22:59:17 Add PGDLLIMPORT lines to some variables
Previous Message Peter Geoghegan 2017-11-16 22:43:20 Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted