Streaming replication and triggering failover

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming replication and triggering failover
Date: 2010-01-08 09:58:27
Message-ID: 4B4701C3.8050709@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The trigger file logic feels a bit backwards. As the patch stands, when
the standby starts up, it retries connecting to the master server
indefinitely, until a connection is successfully established. Then it
streams until the connection breaks. If the connection is dropped
abruptly, because of a network problem or crash in the master, standby
retries indefinitely.

If master is shut down cleanly, standby gets out of recovery mode, and
starts up. Unless the trigger file is present; if it is, standby waits
for it to go away before finishing recovery.

So the trigger file is really a "holdoff file", like a safety catch on a
gun. At the very least it should be renamed, but I don't think that's a
very useful behavior anyway.

It doesn't seem wise to consider a clean shutdown of the master as a
signal to trigger failover. If you're setting up a HA system, that by
itself is not robust enough; you also need to trigger failover if the
master goes down unexpectedly, or if the standby was disconnected for
some reason when the master was shut down. Secondly, what if you want to
restart the master server, without initiating failover? You'll have to
restart the standby too, to have it reconnect.

Let's have a default of no failover, and retry connecting to the master
indefinitely. When you *do* want to fail over, create the trigger file.
When the standby sees the trigger file, it should stop streaming, finish
up replaying what it had streamed up to that point, and start up as new
master.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-01-08 10:02:59 Re: Serializable Isolation without blocking
Previous Message Leonardo F 2010-01-08 09:49:42 Re: Patch: Allow substring/replace() to get/set bit values