Re: streaming replication breaks horribly if master crashes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-16 20:26:16
Message-ID: AANLkTilzOkAH35ViRb_2-YHrdKQLevx9p9biFVxzKen3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away.  Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat there waiting, either forever or at least for
>> longer than I was willing to wait.
>
> Yes, I've noticed this.  That was the reason for forcing walreceiver to
> shut down on a restart per prior discussion and patches.  This needs to
> be on the open items list ... possibly it'll be fixed by Simon's
> keepalive patch?  Or is it just a tcp_keeplalive issue?

I think a TCP keepalive might be enough, but I have not tried to code
or test it.

>> More seriously, I was able to demonstrate that the problem linked in
>> the thread above is real: if the master crashes after streaming WAL
>> that it hasn't yet fsync'd, then on recovery the slave's xlog position
>> is ahead of the master.  So far I've only been able to reproduce this
>> with fsync=off, but I believe it's possible anyway,
>
> ... and some users will turn fsync off.  This is, in fact, one of the
> primary uses for streaming replication: Durability via replicas.

Yep.

>> and this just
>> makes it more likely.  After the most recent crash, the master thought
>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
>> the master, the slave then thought that
>> pg_last_xlog_receive_location() was 1/87000000.
>
> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
> have actually prevented the slave from being corrupted.
>
> My question, though, is detecting out-of-sequence xlogs *enough*?  Are
> there any crash conditions on the master which would cause the master to
> reuse the same locations for different records, for example?  I don't
> think so, but I'd like to be certain.

The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots. I
believe that what we need to do is make sure that the master only
sends WAL it has already fsync'd (Tom suggested on another thread that
this might be necessary, and I think it's now clear that it is 100%
necessary). But I'm not sure how this will play with fsync=off - if
we never fsync, then we can't ever really send any WAL without risking
this failure mode. Similarly with synchronous_commit=off, I believe
that the next checkpoint will still fsync WAL, but the lag might be
long.

I think we should also change the slave to panic and shut down
immediately if its xlog position is ahead of the master. That can
never be a watertight solution because you can always advance the xlog
position on them master and mask the problem. But I think we should
do it anyway, so that we at least have a chance of noticing that we're
hosed. I wish I could think of something a little more watertight...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-06-16 20:30:08 Re: streaming replication breaks horribly if master crashes
Previous Message Robert Haas 2010-06-16 20:15:11 Re: streaming replication breaks horribly if master crashes