Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2015-11-11 09:22:11
Message-ID: CAEepm=3X40PP-JDW2cnWyRQHod9VAu-oMBTXy7wZqdRMtyhi0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 11/11/2015 10:23 AM, Simon Riggs wrote:
>
>> On 11 November 2015 at 05:37, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
>> >
>> wrote:
>>
>> Many sites use hot standby servers to spread read-heavy workloads over
>> more
>>
>>> hardware, or at least would like to. This works well today if your
>>> application can tolerate some time lag on standbys. The problem is that
>>> there is no guarantee of when a particular commit will become visible for
>>> clients connected to standbys. The existing synchronous commit feature
>>> is
>>> no help here because it guarantees only that the WAL has been flushed on
>>> another server before commit returns. It says nothing about whether it
>>> has
>>> been applied or whether it has been applied on the standby that you
>>> happen
>>> to be talking to.
>>>
>>
>> Thanks for working on this issue.
>>
>
> +1.
>
> 3. Commit on the primary with "causal_reads = on" waits for all
>>> 'available' standbys either to apply the commit record, or to cease to be
>>> 'available' and begin raising the error if they are still alive (because
>>> their authorizations have expired).
>>>
>>>
>> This causes every writer to wait.
>>
>> What we want is to isolate the wait only to people performing a write-read
>> sequence, so I think it should be readers that wait. Let's have that
>> debate
>> up front before we start reviewing the patch.
>>
>
> Agreed. And in the write-read sequence, you don't need to wait at the
> write either, it's enough that you wait just before you start doing the
> read. An application might do a lot of other things between the two, so
> that in most cases, there would in fact be no waiting as the record is
> already applied when you perform the read.
>
> I'm thinking the client should get some kind of a token back from the
> commit, and it could use the token on the standby, to wait for that commit
> to be applied. The token could be just the XID, or the LSN of the commit
> record. Or the application could generate the token and pass it to the
> server in the commit, similar to how 2PC works. So the interaction would be
> something like:
>
> In master:
> BEGIN;
> INSERT INTO FOO ...;
> COMMIT;
> Server returns: COMMITted with token 1234
>
> Later, in standby:
> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> SELECT * FROM foo;
> ...

I thought about this question, and considered three different approaches:

1. Reader waits with exposed LSNs, as Heikki suggests. This is what
BerkeleyDB does in "read-your-writes" mode. It means that application
developers have the responsibility for correctly identifying transactions
with causal dependencies and dealing with LSNs (or whatever equivalent
tokens), potentially even passing them to other processes where the
transactions are causally dependent but run by multiple communicating
clients (for example, communicating microservices). This makes it
difficult to retrofit load balancing to pre-existing applications and (like
anything involving concurrency) difficult to reason about as applications
grow in size and complexity. It is efficient if done correctly, but it is
a tax on application complexity.

2. Reader waits for a conservatively chosen LSN. This is roughly what
MySQL derivatives do in their "causal_reads = on" and "wsrep_sync_wait = 1"
modes. Read transactions would start off by finding the current end of WAL
on the primary, since that must be later than any commit that already
completed, and then waiting for that to apply locally. That means every
read transaction waits for a complete replication lag period, potentially
unnecessarily. This is tax on readers with unnecessary waiting.

3. Writer waits, as proposed. In this model, there is no tax on readers
(they have zero overhead, aside from the added complexity of dealing with
the possibility of transactions being rejected when a standby falls behind
and is dropped from 'available' status; but database clients must already
deal with certain types of rare rejected queries/failures such as
deadlocks, serialization failures, server restarts etc). This is a tax on
writers.

My thinking was that the reason for wanting to load balance over a set of
hot standbys is because you have a very read-heavy workload, so it makes
sense to tax the writers and leave the many dominant readers unburdened, so
(3) should be better than (2) for the majority of users who want such a
configuration. (Note also that it's not a requirement to tax every write;
with this proposal you can set causal_reads to off for those transactions
where you know there is no possibility of a causally dependent read).

As for (1), my thinking was that most application developers would probably
prefer not to have to deal with that type of interface. For users who do
want to do that, it would be comparatively simple to make that possible,
and would not conflict with this proposal. This proposal could be used by
people retrofitting load balancing to an existing applications with
relative ease, or simply not wanting to have to deal with LSNs and
complexity. (I have considered proposing
pg_wait_for_xlog_replay_location(lsn, timeout) separately, which could be
called on a standby with the lsn obtained from pg_current_xlog_location()
on the primary any time after a COMMIT completes, but I was thinking of
that as a different feature addressing a different user base: people
prepared to do more work to squeeze out some extra performance.)

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2015-11-11 10:03:10 proposal: numeric scale functions
Previous Message Atri Sharma 2015-11-11 08:52:10 Re: Proposal: "Causal reads" mode for load balancing reads without stale data