Re: streaming replication breaks horribly if master crashes

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-16 20:32:45
Message-ID: AANLkTil01eBZVtqOWQqp2ZjAd1-JpY5l9PW3Lwn5P96o@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 16, 2010 at 22:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> and this just
>>> makes it more likely.  After the most recent crash, the master thought
>>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>>> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
>>> the master, the slave then thought that
>>> pg_last_xlog_receive_location() was 1/87000000.
>>
>> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
>> have actually prevented the slave from being corrupted.
>>
>> My question, though, is detecting out-of-sequence xlogs *enough*?  Are
>> there any crash conditions on the master which would cause the master to
>> reuse the same locations for different records, for example?  I don't
>> think so, but I'd like to be certain.
>
> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots.  I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd (Tom suggested on another thread that
> this might be necessary, and I think it's now clear that it is 100%
> necessary).  But I'm not sure how this will play with fsync=off - if
> we never fsync, then we can't ever really send any WAL without risking

Well, at this point we can just prevent streaming replication with
fsync=off if we can't think of an easy fix, and then design a "proper
fix" for 9.1. Given how late we are in the cycle.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafael Martinez 2010-06-16 20:38:14 Re: streaming replication breaks horribly if master crashes
Previous Message Kevin Grittner 2010-06-16 20:30:08 Re: streaming replication breaks horribly if master crashes