Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2015-11-18 10:50:17
Message-ID: CAEepm=3NUTR1nZ08P31KtY3cUGDbDZTUpt75C3ZsA5ZWzBg2mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here is a new version of the patch with a few small improvements:

1. Adopted the term '[read] lease', replacing various hand-wavy language
in the comments and code. That seems to be the established term for this
approach[1].

2. Reduced the stalling time on failure. When things go wrong with a
standby (such as losing contact with it), instead of stalling for a
conservative amount of time longer than any lease that might have been
granted, the primary now stalls only until the expiry of the last lease
that actually was granted to a given dropped standby, which should be
sooner.

3. Fixed a couple of bugs that showed up in testing and review (some bad
flow control in the signal handling, and a bug in a circular buffer), and
changed the recovery->walreceiver wakeup signal handling to block the
signal except while waiting in walrcv_receive (it didn't seem a good idea
to interrupt arbitrary syscalls in walreceiver so I thought that would be a
improvement; but of course that area's going to be reworked by Simon's
patch anyway, as discussed elsewhere).

Restating the central idea using the new terminology: So long as they are
replaying fast enough, the primary grants a series of causal reads leases
to standbys allowing them to handle causal reads queries locally without
any inter-node communication for a limited time. Leases are promises that
the primary will wait for the standby to apply commit records OR be dropped
from the set of available causal reads standbys and know that it has been
dropped, before the primary returns from commit, in order to uphold the
causal reads guarantee. In the worst case it can do that by waiting for
the most recently granted lease to expire.

I've also attached a couple of things which might be useful when trying the
patch out: test-causal-reads.c which can be used to test performance and
causality under various conditions, and test-causal-reads.sh which can be
used to bring up a primary and a bunch of local hot standbys to talk to.
(In the hope of encouraging people to take the patch for a spin...)

[1] Originally from a well known 1989 paper on caching, but in the context
of databases and synchronous replication see for example the recent papers
on "Niobe" and "Paxos Quorum Leases" (especially the reference to Google
Megastore). Of course a *lot* more is going on in those very different
algorithms, but at some level "read leases" are being used to allow
local-node-only reads for a limited time while upholding some kind of
global consistency guarantee, in some of those consensus database systems.
I spent a bit of time talking about consistency levels to database guru and
former colleague Alex Scotti who works on a Paxos-based system, and he gave
me the initial idea to try out a lease-based consistency system for
Postgres streaming rep. It seems like a very useful point in the space of
trade-offs to me.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
test-causal-reads.sh application/x-sh 1.8 KB
test-causal-reads.c text/x-csrc 4.5 KB
causal-reads-v3.patch application/octet-stream 72.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-11-18 12:25:59 Re: [DESIGN] ParallelAppend
Previous Message Dean Rasheed 2015-11-18 10:33:19 Re: Bug in numeric multiplication