Re: Standalone synchronous master

From: Alexander Björnhagen <alex(dot)bjornhagen(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2011-12-27 11:39:22
Message-ID: CAO-C5==qfoWkiLhs0SEKmPHsQ-kmWNNpQwVa1JmQOh1J3OZong@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Okay,

Here’s version 3 then, which piggy-backs on the existing flag :

synchronous_commit = on | off | local | fallback

Where “fallback” now means “fall back from sync replication when no
(suitable) standbys are connected”.

This was done on input from Guillaume Lelarge.

> That said, I agree it's not necessarily reasonable to try to defend
> against that in a two node cluster.

That’s what I’ve been trying to say all along but I didn’t give enough
context before so I understand we took a turn there.

You can always walk up to any setup and say “hey, if you nuke that
site from orbit and crash that other thing, and ...” ;) I’m just
kidding of course but you get the point. Nothing is absolute.

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
- Okay, promote the standby

2.The standby fails
- Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

I’ve stated that 1 and 2 together covers way more than 99.9% of what’s
expected in my setup on any given day.

But 3. is what we’ve been talking about ... And well in that case
there is no reason to just go ahead and promote a standby because,
granted, it could be lagging behind if the master decided to switch to
standalone mode just before going down itself.

As long as you do not prematurely or rather instinctively promote the
standby when it has *possibly* lagged behind, you’re good and there is
no risk of data loss. The data might be sitting on a crashed or
otherwise unavailable master, but it’s not lost. Promoting the standby
however is basically saying “forget the master and its data, continue
from where the standby is currently at”.

Now granted this is operationally harder/more complicated than just
synchronous replication where you can always, in any case, just
promote the standby after a master failure, knowing that all data is
guaranteed to be replicated.

> I'm worried that the interface seems a bit
> fragile and that it's hard to "be sure".

With this setup, you can’t promote the standby without first checking
if the replication link was disconnected prior to the master failure.

For me, the benefits outweigh this one drawback because I get more
standby replication guarantee than async replication and more master
availability than sync replication in the most plausible outcomes.

Cheers,

/A

Attachment Content-Type Size
sync-standalone-v3.patch application/octet-stream 10.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-12-27 16:02:40 Re: reprise: pretty print viewdefs
Previous Message Simon Riggs 2011-12-27 10:23:34 Re: CLOG contention