Re: Causal reads take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Causal reads take II
Date: 2017-06-24 04:05:21
Message-ID: CAEepm=1k0VyP8s1yA56_VBmfoXFrsfFHjFOtQjVO_MbxDukyLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Jun 23, 2017 at 11:48 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Apply the patch after first applying a small bug fix for replication
> lag tracking[4]. Then:

That bug fix was committed, so now causal-reads-v17.patch can be
applied directly on top of master.

> 1. Set up some streaming replicas.
> 2. Stick causal_reads_max_replay_lag = 2s (or any time you like) in
> the primary's postgresql.conf.
> 3. Set causal_reads = on in some transactions on various nodes.
> 4. Try to break it!

Someone asked me off-list how to set this up quickly and easily for
testing. Here is a shell script that will start up a primary server
(port 5432) and 3 replicas (ports 5441 to 5443). Set the two paths at
the top of the file before running in. Log in with psql postgres [-p
<port>], then SET causal_reads = on to test its effect.
causal_reads_max_replay_lay is set to 2s and depending on your
hardware you might find that stuff like CREATE TABLE big_table AS
SELECT generate_series(1, 10000000) or a large COPY data load causes
replicas to be kicked out of the set after a while; you can also pause
replay on the replicas with SELECT pg_wal_replay_pause() and
pg_wal_replay_resume(), kill -STOP/-CONT or -9 the walreceiver
processes to similar various failure modes, or run the replicas
remotely and unplug the network. SELECT application_name, replay_lag,
causal_reads_state FROM pg_state_replication to see the current
situation, and also monitor the primary's LOG messages about
transitions. You should find that the
"read-your-writes-or-fail-explicitly" guarantee is upheld, no matter
what you do, and furthermore than failing or lagging replicas don't
hold hold the primary up very long: in the worst case
causal_reads_lease_time for lost contact, and in the best case the
time to exchange a couple of messages with the standby to tell it its
lease is revoked and it should start raising an error. You might find
test-causal-reads.c[1] useful for testing.

> Maybe it needs a better name.

Ok, how about this: the feature could be called "synchronous replay".
The new column in pg_stat_replication could be called sync_replay
(like the other sync_XXX columns). The GUCs could be called
synchronous replay, synchronous_replay_max_lag and
synchronous_replay_lease_time. The language in log messages could
refer to standbys "joining the synchronous replay set".

Restating the purpose of the feature with that terminology: If
synchronous_replay is set to on, then you see the effects of all
synchronous_replay = on transactions that committed before your
transaction began, or an error is raised if that is not possible on
the current node. This allows applications to direct read-only
queries to read-only replicas for load balancing without seeing stale
data. Is that clearer?

Restating the relationship with synchronous replication with that
terminology: while synchronous_commit and synchronous_standby_names
are concerned with distributed durability, synchronous_replay is
concerned with distributed visibility. While the former prevents
commits from returning if the configured level of durability isn't met
(for example "must be flushed on master + any 2 standbys"), the latter
will simply drop any standbys from the synchronous replay set if they
fail or lag more than synchronous_replay_max_lag. It is reasonable to
want to use both features at once: my policy on distributed
durability might be that I want all transactions to be flushed to disk
on master + any of three servers before I report information to users,
and my policy on distributed visibility might be that I want to be
able to run read-only queries on any of my six read-only replicas, but
don't want to wait for any that lag by more than 1 second.

Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D3NF%3D7eLkVR2fefVF9bg6RxpZXoQFmOP3RWE4r4iuO7vg%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
test-causal-reads.sh application/x-sh 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-24 04:26:42 Re: FIPS mode?
Previous Message Curtis Ruck 2017-06-24 03:56:09 FIPS mode?