Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2016-03-30 06:22:31
Message-ID: CAEepm=1Z-E4X8wXEY_VhHAZ=AhN1-Xsj4pFfQycjjMmW5+MRZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2016 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> OK, I committed this, with a few tweaks. In particular, I added a
> flag variable instead of relying on "latch set" == "need to send
> reply"; the other changes were cosmetic.
>
> I'm not sure how much more of this we can realistically get into 9.6;
> the latter patches haven't had much review yet. But I'll set this
> back to Needs Review in the CommitFest and we'll see where we end up.
> But even if we don't get anything more than this, it's still rather
> nice: remote_apply turns out to be only slightly slower than remote
> flush, and it's a guarantee that a lot of people are looking for.

Thank you Michael and Robert!

Please find attached the rest of the patch series, rebased against
master. The goal of the 0002 patch is to provide an accurate
indication of the current replay lag on each standby, visible to users
like this:

postgres=# select application_name, replay_lag from pg_stat_replication;
application_name │ replay_lag
──────────────────┼─────────────────
replica1 │ 00:00:00.000299
replica2 │ 00:00:00.000323
replica3 │ 00:00:00.000319
replica4 │ 00:00:00.000303
(4 rows)

It works by maintaining a buffer of (end of WAL, time now) samples
received from the primary, and then eventually feeding those times
back to the primary when the recovery process replays the
corresponding locations.

Compared to approaches based on commit timestamps, this approach has
the advantage of providing non-misleading information between commits.
For example, if you run a batch load job that takes 1 minute to insert
the whole phonebook and no other transactions run, you will see
replay_lag updating regularly throughout that minute, whereas typical
commit timestamp-only approaches will show an increasing lag time
until a commit record is eventually applied. Compared to simple LSN
location comparisons, it reports in time rather than bytes of WAL,
which can be more meaningful for DBAs.

When the standby is entirely caught up and there is no write activity,
the reported time effectively represents the ping time between the
servers, and is updated every wal_sender_timeout / 2, when keepalive
messages are sent. While new WAL traffic is arriving, the walreceiver
records timestamps at most once per second in a circular buffer, and
then sends back replies containing the recorded timestamps as fast as
the recovery process can apply the corresponding xlog. The lag number
you see is computed by the primary server comparing two timestamps
generated by its own system clock, one of which has been on a journey
to the standby and back.

Accurate lag estimates are a prerequisite for the 0004 patch (about
which more later), but I believe users would find this valuable as a
feature on its own.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0002-replay-lag-v11.patch application/octet-stream 24.2 KB
0003-refactor-syncrep-exit-v11.patch application/octet-stream 4.6 KB
0004-causal-reads-v11.patch application/octet-stream 74.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-30 06:23:23 Re: Using quicksort for every external sort run
Previous Message Harshal Dhumal 2016-03-30 06:16:56 [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.