Re: streaming replication breaks horribly if master crashes

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-16 20:14:20
Message-ID: 4C19309C.1090703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.

Yes, I've noticed this. That was the reason for forcing walreceiver to
shut down on a restart per prior discussion and patches. This needs to
be on the open items list ... possibly it'll be fixed by Simon's
keepalive patch? Or is it just a tcp_keeplalive issue?

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master. So far I've only been able to reproduce this
> with fsync=off, but I believe it's possible anyway,

... and some users will turn fsync off. This is, in fact, one of the
primary uses for streaming replication: Durability via replicas.

> and this just
> makes it more likely. After the most recent crash, the master thought
> pg_current_xlog_location() was 1/86CD4000; the slave thought
> pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to
> the master, the slave then thought that
> pg_last_xlog_receive_location() was 1/87000000.

So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
have actually prevented the slave from being corrupted.

My question, though, is detecting out-of-sequence xlogs *enough*? Are
there any crash conditions on the master which would cause the master to
reuse the same locations for different records, for example? I don't
think so, but I'd like to be certain.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-06-16 20:15:11 Re: streaming replication breaks horribly if master crashes
Previous Message Kevin Grittner 2010-06-16 20:14:02 Re: streaming replication breaks horribly if master crashes