Re: Synchronous Standalone Master Redoux

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Standalone Master Redoux
Date: 2012-07-17 05:58:57
Message-ID: 5004FF21.1020902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.07.2012 22:01, Robert Haas wrote:
> On Sat, Jul 14, 2012 at 7:54 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
>> So, here's the core issue with degraded mode. I'm not mentioning this
>> to block any patch anyone has, but rather out of a desire to see someone
>> address this core problem with some clever idea I've not thought of.
>> The problem in a nutshell is: indeterminancy.
>>
>> Assume someone implements degraded mode. Then:
>>
>> 1. Master has one synchronous standby, Standby1, and two asynchronous,
>> Standby2 and Standby3.
>>
>> 2. Standby1 develops a NIC problem and is in and out of contact with
>> Master. As a result, it's flipping in and out of synchronous / degraded
>> mode.
>>
>> 3. Master fails catastrophically due to a RAID card meltdown. All data
>> lost.
>>
>> At this point, the DBA is in kind of a pickle, because he doesn't know:
>>
>> (a) Was Standby1 in synchronous or degraded mode when Master died? The
>> only log for that was on Master, which is now gone.
>>
>> (b) Is Standby1 actually the most caught up standby, and thus the
>> appropriate new master for Standby2 and Standby3, or is it behind?
>>
>> With the current functionality of Synchronous Replication, you don't
>> have either piece of indeterminancy, because some external management
>> process (hopefully located on another server) needs to disable
>> synchronous replication when Standby1 develops its problem. That is, if
>> the master is accepting synchronous transactions at all, you know that
>> Standby1 is up-to-date, and no data is lost.
>>
>> While you can answer (b) by checking all servers, (a) is particularly
>> pernicious, because unless you have the application log all "operating
>> in degraded mode" messages, there is no way to ever determine the truth.
>
> Good explanation.
>
> In brief, the problem here is that you can only rely on the
> no-transaction-loss guarantee provided by synchronous replication if
> you can be certain that you'll always be aware of it when synchronous
> replication gets shut off. Right now that is trivially true, because
> it has to be shut off manually. If we provide a facility that logs a
> message and then shuts it off, we lose that certainty, because the log
> message could get eaten en route by the same calamity that takes down
> the master. There is no way for the master to WAIT for the log
> message to be delivered and only then degrade.
>
> However, we could craft a mechanism that has this effect. Suppose we
> create a new GUC with a name like
> synchronous_replication_status_change_command. If we're thinking
> about switching between synchronous replication and degraded mode
> automatically, we first run this command. If it returns 0, then we're
> allowed to switch, but if it returns anything else, then we're not
> allowed to switch (but can retry the command after a suitable
> interval). The user is responsible for supplying a command that
> records the status change somewhere off-box in a fashion that's
> sufficiently durable that the user has confidence that the
> notification won't subsequently be lost. For example, the
> user-supplied command could SSH into three machines located in
> geographically disparate data centers and create a file with a certain
> name on each one, returning 0 only if it's able to reach at least two
> of them and create the file on all the ones it can reach. If the
> master dies, but at least two out of the those three machines are
> still alive, we can be certain of determining with confidence whether
> the master might have been in degraded mode at the time of the crash.
>
> More or less paranoid versions of this scheme are possible depending
> on user preferences, but the key point is that for the
> no-transaction-loss guarantee to be of any use, there has to be a way
> to reliably know whether that guarantee was in effect at the time the
> master died in a fire. Logging isn't enough, but I think some more
> sophisticated mechanism can get us there.

Yeah, I think that's the right general approach. Not necessarily that
exact GUC, but something like that. I don't want PostgreSQL to get more
involved in determining the state of the standby, when to do failover,
or when to fall back to degraded mode. That's a whole new territory with
all kinds of problems, and there is plenty of software out there to
handle that. Usually you have some external software to do monitoring
and to initiate failovers anyway. What we need is a better API for
co-operating with such software, to perform failover, and to switch
replication between synchronous and asynchronous modes.

BTW, one little detail that I don't think has been mentioned in this
thread before: Even though the master currently knows whether a standby
is connected or not, and you could write a patch to act based on that,
there are other failure scenarios where you would still not be happy.
For example, imagine that the standby has a disk failure. It stays
connected to the master, but fails to fsync anything to disk. Would you
want to fall back to degraded mode and just do asynchronous replication
in that case? How do you decide when to do that in the master? Or what
if the standby keeps making progress, but becomes incredibly slow for
some reason, like disk failure in a RAID array? I'd rather outsource all
that logic to external monitoring software - software that you should be
running anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-07-17 06:45:43 Re: Synchronous Standalone Master Redoux
Previous Message Alvaro Herrera 2012-07-17 04:32:12 Re: [PATCH] lock_timeout and common SIGALRM framework