Re: Sync Rep v17

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep v17
Date: 2011-03-02 21:33:20
Message-ID: AANLkTimfQU_pu2ZJ1FUBN4fZakLjWW6EozCXtW4p35_z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> 1. Everything is humming along.
>> 2. The network link between the master and standby drops.
>> 3. Then it comes back up again.
>>
>> After (2) and before (3), what should the behavior the master be?  It
>> seems clear to me that it should WAIT.  Otherwise, a crash on the
>
> That just means you want data high availability, not service HA.  Some
> people want the *service* to stay available in such a situation.
>
>> master now leaves you with transactions that were confirmed committed
>> but not actually replicated to the standby.  If you were OK with that
>> scenario, you would have used asynchronous replication in the first
>> place.
>
> What is so hard to understand in "worst case scenario" being different
> than "expected conditions".  We all know that getting the last percent
> is more expensive than getting the 99 first one.  We have no reason to
> force people into building for the last percent whatever their context.

I don't understand how synchronous replication with
allow_standalone_primary=on gives you ANY extra nines. AFAICS, the
only point of having synchronous replication is that you wait to
acknowledge the commit to the client until the commit record has been
replicated. Doing that only when the standby happens to be connected
doesn't seem like it helps much.

If the master is up, then it doesn't really matter what the standby
does; we don't need high availability in that case, because we have
just plain regular old availability.

If the master goes down, then we need to know that we haven't lost any
confirmed-committed transactions. With allow_standalone_primary=off,
we don't know that. They might be, or they might not be. Even if we
have 100 separate standbys, there is no way of knowing whether there
was a time period just before the crash during which the master
couldn't get out to the Internet, and some commits by clients on the
local network went through. Maybe with some careful network
engineering you can convince yourself that that isn't very likely, but
I sure wouldn't bet on it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-03-02 21:45:13 Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Previous Message daveg 2011-03-02 21:30:34 Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum