Re: Streaming replication and triggering failover

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and triggering failover
Date: 2010-01-08 10:04:18
Message-ID: 9837222c1001080204k29e8feb0k61e23704a5582b43@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 10:58, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> The trigger file logic feels a bit backwards. As the patch stands, when
> the standby starts up, it retries connecting to the master server
> indefinitely, until a connection is successfully established. Then it
> streams until the connection breaks. If the connection is dropped
> abruptly, because of a network problem or crash in the master, standby
> retries indefinitely.
>
> If master is shut down cleanly, standby gets out of recovery mode, and
> starts up. Unless the trigger file is present; if it is, standby waits
> for it to go away before finishing recovery.
>
> So the trigger file is really a "holdoff file", like a safety catch on a
> gun. At the very least it should be renamed, but I don't think that's a
> very useful behavior anyway.
>
> It doesn't seem wise to consider a clean shutdown of the master as a
> signal to trigger failover. If you're setting up a HA system, that by
> itself is not robust enough; you also need to trigger failover if the
> master goes down unexpectedly, or if the standby was disconnected for
> some reason when the master was shut down. Secondly, what if you want to
> restart the master server, without initiating failover? You'll have to
> restart the standby too, to have it reconnect.
>
> Let's have a default of no failover, and retry connecting to the master
> indefinitely. When you *do* want to fail over, create the trigger file.
> When the standby sees the trigger file, it should stop streaming, finish
> up replaying what it had streamed up to that point, and start up as new
> master.

+1.

The default should be to "maintain the replication cluster", if
nothing else then by principle of least surprise.

It would also agree with a well-established procedure, which is what
pg_standby does. Keeping the same basic behavior around something like
this can only be a good thing.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-01-08 10:11:34 Re: Serializable Isolation without blocking
Previous Message Markus Wanner 2010-01-08 10:02:59 Re: Serializable Isolation without blocking