Re: Issues with Quorum Commit

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-13 06:50:06
Message-ID: AANLkTikfUgHhs=sBP2N1xPQZ4TECgC6wE3CU1JZBrpJj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 13, 2010 at 2:43 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 13.10.2010 08:21, Fujii Masao wrote:
>>
>> On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>>>
>>> It shouldn't be too hard to fix. Walsender needs to be able to read WAL
>>> from
>>> preceding timelines, like recovery does, and walreceiver needs to write
>>> the
>>> incoming WAL to the right file.
>>
>> And walsender seems to need to transfer the current timeline history to
>> the standby. Otherwise, the standby cannot recover the WAL file with new
>> timeline. And the standby might need to create the timeline history file
>> in order to recover the WAL file with new timeline even after it's
>> restarted.
>
> Yes, true, you need that too.
>
> It might be good to divide this work into two phases, teaching archive
> recovery to notice new timelines appearing in the archive first, and doing
> the walsender/walreceiver changes after that.

There's another problem here we should think about, too. Suppose you
have a master and two standbys. The master dies. You promote one of
the standbys, which turns out to be behind the other. You then
repoint the other standby at the one you promoted. Congratulations,
your database is now very possible corrupt, and you may very well get
no warning of that fact. It seems to me that we would be well-advised
to install some kind of bullet-proof safeguard against this kind of
problem, so that you will KNOW that the standby needs to be re-synced.
I mention this because I have a vague feeling that timelines are
supposed to prevent you from getting different WAL histories confused
with each other, but they don't actually cover all the cases that can
happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2010-10-13 07:19:26 Re: Slow count(*) again...
Previous Message Neil Whelchel 2010-10-13 06:47:19 Re: Slow count(*) again...