Re: Issues with Quorum Commit

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-13 09:22:41
Message-ID: AANLkTimGC3i2=dge3EA3cZiNv_yHf3GHz7VtUqPOB7T_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> There's another problem here we should think about, too.  Suppose you
> have a master and two standbys.  The master dies.  You promote one of
> the standbys, which turns out to be behind the other.  You then
> repoint the other standby at the one you promoted.  Congratulations,
> your database is now very possible corrupt, and you may very well get
> no warning of that fact.  It seems to me that we would be well-advised
> to install some kind of bullet-proof safeguard against this kind of
> problem, so that you will KNOW that the standby needs to be re-synced.

Yep. This is why I said it's not easy to implement that.

To start the standby without taking a base backup from new master after
failover, the user basically has to promote the standby which is ahead
of the other standbys (e.g., by comparing pg_last_xlog_replay_location
on each standby).

As the safeguard, we seem to need to compare the location at the switch
of the timeline on the master with the last replay location on the standby.
If the latter location is ahead AND the timeline ID of the standby is not
the same as that of the master, we should emit warning and terminate the
replication connection.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Whelchel 2010-10-13 10:16:11 Re: Slow count(*) again...
Previous Message Fujii Masao 2010-10-13 09:04:44 Re: Issues with Quorum Commit