Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2015-11-15 22:24:05
Message-ID: CAEepm=3OgFBXONe-2=P+nkROovQ7cOsD3Q7TbTE0SQg-Oe=fSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 15, 2015 at 11:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 12 November 2015 at 18:25, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
> wrote:
>
>
>> I don't want to get bogged down in details, while we're talking about
>> the 30,000 foot view).
>>
>
> Hmm, if that's where we're at, I'll summarize my thoughts.
>
> All of this discussion presupposes we are distributing/load balancing
> queries so that reads and writes might occur on different nodes.
>
> We need a good balancer. Any discussion of this that ignores the balancer
> component is only talking about half the solution. What we need to do is
> decide whether functionality should live in the balancer or the core.
>
> Your option (1) is viable, but only in certain cases. We could add support
> for some token/wait mechanism but as you say, this would require
> application changes not pooler changes.
>
> Your option (2) is wider but also worse in some ways. It can be
> implemented in a pooler.
>
> Your option (3) doesn't excite me much. You've got a load of stuff that
> really should happen in a pooler. And at its core we have
> synchronous_commit = apply but with a timeout rather than a wait. So
> anyway, consider me nudged to finish my patch to provide capability for
> that by 1 Jan.
>

Just to be clear, this patch doesn't use a "timeout rather than a wait".
It always waits for the current set of available causal reads standbys to
apply the commit. It's just that nodes get kicked out of that set pretty
soon if they don't keep up, a bit like a RAID controller dropping a failing
disk. And it does so using a protocol that ensures that the dropped
standby starts raising the error, even if contact has been lost with it, so
the causal reads guarantee is maintained at all times for all clients.

On a related note, any further things like "GUC causal_reads_standby_names"
> should be implemented by Node Registry as a named group of nodes. We can
> have as many arbitrary groups of nodes as we want. If that sounds strange
> look back at exactly why GUCs are called GUCs.
>

Agreed, the application_name whitelist stuff is clunky. I left it out of
the first version I posted, not wanting the focus of this proposal to be
side-tracked. But as Ants Aasma pointed out, some users might need
something like that, so I posted a 2nd version that follows the established
example, again not wanting to distract with anything new in that area. Of
course that would eventually be replaced/improved as part of a future node
topology management project.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-11-15 23:05:41 Re: Parallel Seq Scan
Previous Message Robert Haas 2015-11-15 21:24:24 Re: [PATCH] Refactoring of LWLock tranches