Re: Sync Rep Design

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2011-01-01 17:49:44
Message-ID: 4D1F6938.1090101@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/01/2011 06:29 PM, Simon Riggs wrote:
> On Sat, 2011-01-01 at 18:13 +0100, Stefan Kaltenbrunner wrote:
>> On 01/01/2011 05:55 PM, Simon Riggs wrote:
>>>
>>> It appears to me there has been substantial confusion over alternatives,
>>> because of a misunderstanding about how synchronisation works. Requiring
>>> confirmation that standbys are in sync is *not* the same thing as them
>>> actually being in sync. Every single proposal made by anybody here on
>>> hackers that supports multiple standby servers suffers from the same
>>> issue: when the primary crashes you need to work out which standby
>>> server is ahead.
>>
>> aaah that was exactly what I was after - so the problem is that when you
>> have a sync standby it will technically always be "in front" of the
>> master (because it needs to fsync/apply/whatever before the master).
>> In the end the question boils down to what is "the bigger problem" in
>> the case of a lost master:
>
>> a) a transaction that was confirmed on the master but might not be on
>> any of the surviving sync standbys (or you will never know if it is) -
>> this is how I understand the proposal so far
>
> No that cannot happen, the current situation is that we will fsync WAL
> on the master, then fsync WAL on the standby, then reply to the master.
> The standby is never ahead of the master, at any point.

hmm maybe my "surviving" standbys(the case I'm wondering about is whole
datacenter failures which might take out more than just the master) was
not clear - consider three boxes, one master and two standby and
semisync replication(ie any one of the standbys is enough to reply).

1. master fsyncs wal
2. standby #1 fsyncs and replies
3. master confirms commit
4. desaster strikes and destroys master and standby #1 while standby m2
never had time to apply the change(IO/CPU load, latency, whatever)
5. now you have a sync standby that is missing something that was
commited on the master and confirmed to the client and no way to verify
that this thing happened (same problem with more than two standbys - as
long as you lose ONE standby and the master at the same time you will
never be sure)

what is it that I'm missing here?

>
>> b) a transaction that was not yet confirmed on the master but might have
>> been applied on the surving standby before the desaster - this is what I
>> understand "confirm from all sync standbys" could result in.
>
> Yes, that is described in the docs changes I published.
>
> (a) was discussed, but ruled out, since it would require any crash/immed
> shutdown of the master to become a failover, or have some kind of weird
> back channel to give the missing data back.
>
> There hasn't been any difference of opinion in this area, that I am
> aware of. All proposals have offered (b).

hmm I'm confused now - any chance you mixed up a & b here because in a)
no backchannel is needed because the standby could just fetch the
missing data from the master?
If that is the case I agree that it would be hard to get the replication
up again after a crash of the master with a standby that is ahead but in
the end it would be a business decision (as in conflict resolution) on
what to do - take the "ahead" standbys data and use that or destroy the
old standby and recreate.

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-01-01 18:17:46 Re: ALTER TABLE .. SET SCHEMA lock strength
Previous Message Simon Riggs 2011-01-01 17:30:29 Re: Sync Rep Design