Re: streaming replication breaks horribly if master crashes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-17 05:57:55
Message-ID: AANLkTinI949X-OWATDntssWPnRVO5JxkLbdSCpvDl-e6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> The first problem I noticed is that the slave never seems to realize
>>> that the master has gone away.  Every time I crashed the master, I had
>>> to kill the wal receiver process on the slave to get it to reconnect;
>>> otherwise it just sat there waiting, either forever or at least for
>>> longer than I was willing to wait.
>>
>> Yes, I've noticed this.  That was the reason for forcing walreceiver to
>> shut down on a restart per prior discussion and patches.  This needs to
>> be on the open items list ... possibly it'll be fixed by Simon's
>> keepalive patch?  Or is it just a tcp_keeplalive issue?
>
> I think a TCP keepalive might be enough, but I have not tried to code
> or test it.

The "keepalive on libpq" patch would help.
https://commitfest.postgresql.org/action/patch_view?id=281

>>> and this just
>>> makes it more likely.  After the most recent crash, the master thought
>>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>>> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
>>> the master, the slave then thought that
>>> pg_last_xlog_receive_location() was 1/87000000.
>>
>> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
>> have actually prevented the slave from being corrupted.
>>
>> My question, though, is detecting out-of-sequence xlogs *enough*?  Are
>> there any crash conditions on the master which would cause the master to
>> reuse the same locations for different records, for example?  I don't
>> think so, but I'd like to be certain.
>
> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots.  I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd (Tom suggested on another thread that
> this might be necessary, and I think it's now clear that it is 100%
> necessary).

The attached patch changes walsender so that it always sends WAL up to
LogwrtResult.Flush instead of LogwrtResult.Write.

> But I'm not sure how this will play with fsync=off - if
> we never fsync, then we can't ever really send any WAL without risking
> this failure mode.  Similarly with synchronous_commit=off, I believe
> that the next checkpoint will still fsync WAL, but the lag might be
> long.

First of all, we should not restart the master after the crash in
fsync=off case. That would cause the corruption of the master database
itself.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
send_after_fsync_v1.patch application/octet-stream 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-06-17 06:09:48 Re: streaming replication breaks horribly if master crashes
Previous Message KaiGai Kohei 2010-06-17 05:22:37 modular se-pgsql as proof-of-concept