Re: streaming replication breaks horribly if master crashes

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-16 23:08:44
Message-ID: AANLkTinPCrNGbdxhxpqyyDthOyS4Na3UKDPYdyFX70BY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away.  Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat there waiting, either forever or at least for
>> longer than I was willing to wait.
>
> TCP timeout is the answer there.

If you mean TCP Keepalives, I disagree quite strongly. If you want the
application to guarantee any particular timing constraints then you
have to implement that in the application using timers and data
packets. TCP keepalives are for detecting broken network connections,
not enforcing application rules. Using TCP timeouts would have a
number of problems: On many systems they are impossible or difficult
to adjust and worse, it would make it impossible to distinguish an
postgres master crash from a transient or permanent network outage.

>> More seriously, I was able to demonstrate that the problem linked in
>> the thread above is real: if the master crashes after streaming WAL
>> that it hasn't yet fsync'd, then on recovery the slave's xlog position
>> is ahead of the master.
>
> So indeed we'd better change walsender to not get ahead of the fsync'd
> position.  And probably also warn people to not disable fsync on the
> master, unless they're willing to write it off and fail over at any
> system crash.
>
>> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.
>
> Doesn't seem tremendously insoluble from here ...

For the case of fsync=off I can't get terribly excited about the slave
being ahead of the master after a crash. After all the master is toast
anyways. It seems to me in this situation the slave should detect that
the master has failed and automatically come up in master mode. Or
perhaps it should just shut down and then refuse to come up as a slave
again on the basis that it would be unsafe precisely because it might
be ahead of the (corrupt) master. At some point we should consider
having a server set to fsync=off refuse to come back up unless it was
shut down cleanly anyways. Perhaps we should put a strongly worded
warning now.

For the case of fsync=on it does seem to me to be terribly obvious
that the master should never send records to the slave that aren't
fsynced on the master. For 9.1 the other option proposed would work as
well but would be more complex -- to send and store records
immediately but not replay them on the slave until they're either
fsynced on the master or failover occurs.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-06-16 23:08:59 Re: ANNOUNCE list (was Re: New PGXN Extension site)
Previous Message Joshua D. Drake 2010-06-16 22:27:01 Re: ANNOUNCE list (was Re: New PGXN Extension site)