Re: Issues with Quorum Commit

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-07 17:44:43
Message-ID: AANLkTi=B0d75Pf4W4GUgKVHhCJs_Rh=CMWNp5xfT40B_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> So if you have k = 3 and N = 10, then you can have 10 standbys and only
> 3 of them need to ack any specific commit for the master to proceed. As
> long as (a) you retain at least one of the 3 which ack'd, and (b) you
> have some way of determining which standby is the most "caught up", data
> loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong
> 4, to lose data.
>
> The advantage of this for availability over just having k = N = 3 comes
> when one of the standbys is responding slowly (due to traffic) or goes
> offline unexpectedly due to a hardware failure.  In the k = N = 3 case,
> the system halts.  In the k = 3, N = 10 case, you can lose up to 7
> standbys without the system going down.

Sure, but here is where I might not be following.

If you want "synchronous replication" because you want "query
availabilty" while making sure you're not getting "stale" queries from
all your slaves, than using your k < N (k = 3 and N - 10) situation is
screwing your self.

To get "non-stale" responses, you can only query those k=3 servers.
But you've shot your self in the foot because you don't know which
3/10 those will be. The other 7 *are* stale (by definition). They
talk about picking the "caught up" slave when the master fails, but
you actually need to do that for *every query*.

If you say they are "pretty close so by the time you get the query to
them they will be caught up", well then, all you really want is good
async replication, you don't really *need* the synchronous part.

The only case I see a "race to quorum" type of k < N being useful is
if you're just trying to duplicate data everywhere, but not actually
querying any of the replicas. I can see that "all queries go to the
master, but the chances are pretty high the multiple machines are
going to fail so I want >> multiple replicas" being useful, but I
*don't* think that's what most people are wanting in their "I want 3
of 10 servers to ack the commit".

The difference between good async and sync is only the *guarentee*.
If you don't need the guarantee, you don't need the synchronous part.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-10-07 17:45:29 Re: standby registration (was: is sync rep stalled?)
Previous Message Dave Page 2010-10-07 17:39:55 Re: standby registration (was: is sync rep stalled?)