Re: Synch failover WAS: Support for N synchronous standby servers - take 2

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Date: 2015-07-03 08:59:03
Message-ID: CAD21AoAVj7EypB1dG7LECzsyDV4+nZWPJbiyaTGZhdMWhr1EAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> On 07/02/2015 12:44 PM, Andres Freund wrote:
>>> On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
>>>> So there's two parts to this:
>>>>
>>>> 1. I need to ensure that data is replicated to X places.
>>>>
>>>> 2. I need to *know* which places data was synchronously replicated to
>>>> when the master goes down.
>>>>
>>>> My entire point is that (1) alone is useless unless you also have (2).
>>>
>>> I think there's a good set of usecases where that's really not the case.
>>
>> Please share! My plea for usecases was sincere. I can't think of any.
>>
>>>> And do note that I'm talking about information on the replica, not on
>>>> the master, since in any failure situation we don't have the old
>>>> master around to check.
>>>
>>> How would you, even theoretically, synchronize that knowledge to all the
>>> replicas? Even when they're temporarily disconnected?
>>
>> You can't, which is why what we need to know is when the replica thinks
>> it was last synced from the replica side. That is, a sync timestamp and
>> lsn from the last time the replica ack'd a sync commit back to the
>> master successfully. Based on that information, I can make an informed
>> decision, even if I'm down to one replica.
>>
>>>> ... because we would know definitively which servers were in sync. So
>>>> maybe that's the use case we should be supporting?
>>>
>>> If you want automated failover you need a leader election amongst the
>>> surviving nodes. The replay position is all they need to elect the node
>>> that's furthest ahead, and that information exists today.
>>
>> I can do that already. If quorum synch commit doesn't help us minimize
>> data loss any better than async replication or the current 1-redundant,
>> why would we want it? If it does help us minimize data loss, how?
>
> In your example of "2" : { "local_replica", "london_server", "nyc_server" },
> if there is not something like quorum commit, only local_replica is synch
> and the other two are async. In this case, if the local data center gets
> destroyed, you need to promote either london_server or nyc_server. But
> since they are async, they might not have the data which have been already
> committed in the master. So data loss! Of course, as I said yesterday,
> they might have all the data and no data loss happens at the promotion.
> But the point is that there is no guarantee that no data loss happens.
> OTOH, if we use quorum commit, we can guarantee that either london_server
> or nyc_server has all the data which have been committed in the master.
>
> So I think that quorum commit is helpful for minimizing the data loss.
>

Yeah, quorum commit is helpful for minimizing data loss in comparison
with today replication.
But in this your case, how can we know which server we should use as
the next master server, after local data center got down?
If we choose a wrong one, we would get the data loss.

Regards,

--
Sawada Masahiko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2015-07-03 09:23:20 Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Previous Message Heikki Linnakangas 2015-07-03 08:54:52 Re: PATCH: pgbench - remove thread fork-emulation