Re: Synch failover WAS: Support for N synchronous standby servers - take 2

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Date: 2015-07-02 21:54:19
Message-ID: 5595B30B.9030605@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/02/2015 12:44 PM, Andres Freund wrote:
> On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
>> So there's two parts to this:
>>
>> 1. I need to ensure that data is replicated to X places.
>>
>> 2. I need to *know* which places data was synchronously replicated to
>> when the master goes down.
>>
>> My entire point is that (1) alone is useless unless you also have (2).
>
> I think there's a good set of usecases where that's really not the case.

Please share! My plea for usecases was sincere. I can't think of any.

>> And do note that I'm talking about information on the replica, not on
>> the master, since in any failure situation we don't have the old
>> master around to check.
>
> How would you, even theoretically, synchronize that knowledge to all the
> replicas? Even when they're temporarily disconnected?

You can't, which is why what we need to know is when the replica thinks
it was last synced from the replica side. That is, a sync timestamp and
lsn from the last time the replica ack'd a sync commit back to the
master successfully. Based on that information, I can make an informed
decision, even if I'm down to one replica.

>> ... because we would know definitively which servers were in sync. So
>> maybe that's the use case we should be supporting?
>
> If you want automated failover you need a leader election amongst the
> surviving nodes. The replay position is all they need to elect the node
> that's furthest ahead, and that information exists today.

I can do that already. If quorum synch commit doesn't help us minimize
data loss any better than async replication or the current 1-redundant,
why would we want it? If it does help us minimize data loss, how?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-07-02 21:54:39 Re: Time to fully remove heap_formtuple() and friends?
Previous Message Tom Lane 2015-07-02 21:27:38 Re: Exposing PG_VERSION_NUM in pg_config