Quick Links

Re: streaming replication breaks horribly if master crashes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: streaming replication breaks horribly if master crashes
Date:	2010-06-16 20:56:53
Message-ID:	14597.1276721813@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.

TCP timeout is the answer there.

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.

So indeed we'd better change walsender to not get ahead of the fsync'd
position. And probably also warn people to not disable fsync on the
master, unless they're willing to write it off and fail over at any
system crash.

> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.

Doesn't seem tremendously insoluble from here ...

regards, tom lane

In response to

streaming replication breaks horribly if master crashes at 2010-06-16 19:47:11 from Robert Haas

Responses

Re: streaming replication breaks horribly if master crashes at 2010-06-16 23:08:44 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joshua D. Drake	2010-06-16 21:58:13	Re: ANNOUNCE list (was Re: New PGXN Extension site)
Previous Message	Rafael Martinez	2010-06-16 20:38:14	Re: streaming replication breaks horribly if master crashes