Re: Causal reads take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Causal reads take II
Date: 2017-01-19 12:22:39
Message-ID: CAEepm=15WC7A9Zdj2Qbw3CUDXWHe69d=nBpf+jXui7OYXXq11w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2017 at 8:11 PM, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
> On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Long term, I think it would be pretty cool if we could develop a set
>> of features that give you distributed sequential consistency on top of
>> streaming replication. Something like (this | causality-tokens) +
>> SERIALIZABLE-DEFERRABLE-on-standbys[3] +
>> distributed-dirty-read-prevention[4].
>
> Is it necessary that causal writes wait for replication before making
> the transaction visible on the master? I'm asking because the per tx
> variable wait time between logging commit record and making
> transaction visible makes it really hard to provide matching
> visibility order on master and standby.

Yeah, that does seem problematic. Even with async replication or no
replication, isn't there already a race in CommitTransaction() where
two backends could reach RecordTransactionCommit() in one order but
ProcArrayEndTransaction() in the other order? AFAICS using
synchronous replication in one of the transactions just makes it more
likely you'll experience such a visibility difference between the DO
and REDO histories (!), by making RecordTransactionCommit() wait.
Nothing prevents you getting a snapshot that can see t2 but not t1 in
the DO history, while someone doing PITR or querying an asynchronous
standby gets a snapshot that can see t1 but not t2 because those
replay the REDO history.

> In CSN based snapshot
> discussions we came to the conclusion that to make standby visibility
> order match master while still allowing for async transactions to
> become visible before they are durable we need to make the commit
> sequence a vector clock and transmit extra visibility ordering
> information to standby's. Having one more level of delay between wal
> logging of commit and making it visible would make the problem even
> worse.

I'd like to read that... could you please point me at the right bit of
that discussion?

> One other thing that might be an issue for some users is that this
> patch only ensures that clients observe forwards progress of database
> state after a writing transaction. With two consecutive read only
> transactions that go to different servers a client could still observe
> database state going backwards.

True. This patch is about "read your writes", not, erm, "read your
reads". That may indeed be problematic for some users. It's not a
very satisfying answer but I guess you could run a dummy write query
on the primary every time you switch between standbys, or before
telling any other client to run read-only queries after you have done
so, in order to convert your "r r" sequence into a "r w r" sequence...

> It seems that fixing that would
> require either keeping some per client state or a global agreement on
> what snapshots are safe to provide, both of which you tried to avoid
> for this feature.

Agreed. You briefly mentioned this problem in the context of pairs of
read-only transactions a while ago[1]. As you said then, it does seem
plausible to do that with a token system that gives clients the last
commit LSN from the snapshot used by a read only query, so that you
can ask another standby to make sure that LSN has been replayed before
running another read-only transaction. This could be handled
explicitly by a motivated client that is talking to multiple nodes. A
more general problem is client A telling client B to go and run
queries and expecting B to see all transactions that A has seen; it
now has to pass the LSN along with that communication, or rely on some
kind of magic proxy that sees all transactions, or a radically
different system with a GTM.

[1] https://www.postgresql.org/message-id/CA%2BCSw_u4Vy5FSbjVc7qms6PuZL7QV90%2BonBEtK9PFqOsNj0Uhw@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message valeriof 2017-01-19 12:23:21 How to extract bytes from a bit/bit(n) Datum pointer?
Previous Message Rafia Sabih 2017-01-19 12:07:19 Re: Parallel Index-only scan