Quick Links

Re: Issues with Quorum Commit

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Issues with Quorum Commit
Date:	2010-10-08 15:12:49
Message-ID:	4CAF34F1.8000502@bluegap.ch
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/08/2010 04:48 PM, Fujii Masao wrote:
> I believe many systems require write-availability.

Sure. Make sure you have enough standbies to fail over to.

(I think there are even more situations where read-availability is much
more important, though).

>> Start with 0 (i.e. replication off), then add standbies, then increase
>> quorum_commit to your new requirements.
>
> No. This only makes the procedure of failover more complex.

Huh? This doesn't affect fail-over at all. Quite the opposite, the
guarantees and requirements remain the same even after a fail-over.

> What is a full-cluster crash?

The event that all of your cluster nodes are down (most probably due to
power failure, but fires or other catastrophic events can be other
causes). Chances for that to happen can certainly be reduced by
distributing to distant locations, but that equally certainly increases
latency, which isn't always an option.

> Why does it cause a split-brain?

First master node A fails, a standby B takes over, but then fails as
well. Let node C take over. Then the power aggregates catches fire, the
infamous full-cluster crash (where "lights out management" gets a
completely new meaning ;-) ).

Split brain would be the situation that arises if all three nodes (A, B
and C) start up again and think they have been the former master, so
they can now continue to apply new transactions. Their data diverges,
leading to what could be seen as a split-brain from the outside.

Obviously, you must disallow A and B to take the role of the master
after recovery. Ideally, C would continue as the master. However, if the
fire destroyed node C, let's hope you had another (sync!) standby that
can act as the new master. Otherwise you've lost data.

Hope that explains it. Wikipedia certainly provides a better (and less
Postgres colored) explanation.

Regards

Markus Wanner

In response to

Re: Issues with Quorum Commit at 2010-10-08 14:48:54 from Fujii Masao

Responses

Re: Issues with Quorum Commit at 2010-10-13 04:43:57 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2010-10-08 15:19:40	Bug in information_schema: column names don't match spec
Previous Message	Simon Riggs	2010-10-08 15:10:11	Re: Sync Replication with transaction-controlled durability