Re: Sync Rep Design

From: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2011-01-03 01:30:12
Message-ID: 4D2126A4.9050203@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2.1.2011 5:36, Robert Haas wrote:
> On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>> Yes, working out the math is a good idea. Things are much clearer if we
>> do that.
>>
>> Let's assume we have 98% availability on any single server.
>>
>> 1. Having one primary and 2 standbys, either of which can acknowledge,
>> and we never lock up if both standbys fail, then we will have 99.9992%
>> server availability. (So PostgreSQL hits "5 Nines", with data
>> guarantees). ("Maximised availability")
> I don't agree with this math. If the master and one standby fail
> simultaneously, the other standby is useless, because it may or may
> not be caught up with the master. You know that the last transaction
> acknowledged as committed by the master is on at least one of the two
> standbys, but you don't know which one, and so you can't safely
> promote the surviving standby.
> (If you are working in an environment where promoting the surviving
> standby when it's possibly not caught up is OK, then you don't need
> sync rep in the first place: you can just run async rep and get much
> better performance.)
> So the availability is 98% (you are up when the master is up) + 98%^2
> * 2% (you are up when both slaves are up and the master is down) =
> 99.92%. If you had only a single standby, then you could be certain
> that any commit acknowledged by the master was on that standby. Thus
> your availability would be 98% (up when master is up) + 98% * 2% (you
> are up when the master is down and the slave is up) = 99.96%.
>
OTOH, in the case where you need _all_ the slaves to confirm any failing
slave brings
the master down, so adding a slave brings down availability by extra 2%

The solution to achieving good durability AND availability is requiring
N past the
post instead of 1 past the post.

In this case you can get to 99.9992% availability with master + 3 sync
slaves, 2 of which have ACK.

---------------------------------------
Hannu Krosing
Performance and Infinite Scalability Consultant
http://www.2ndQuadrant.com/books/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-01-03 06:53:58 Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Previous Message Andrew Dunstan 2011-01-03 01:14:55 Re: contrib/snapshot