Re: Standalone synchronous master

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 09:07:48
Message-ID: 52CD1564.6060504@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/13/2013 03:09 PM, Rajeev rastogi wrote:
> This patch implements the following TODO item:
>
> Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period
> This would require some type of command to be executed to alert administrators of this change.
> http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
>
> This patch implementation is in the same line as it was given in the earlier thread.
> Some Of the additional important changes are:
>
> 1. Have added two GUC variable to take commands from user to be executed
>
> a. Master_to_standalone_cmd: To be executed before master switches to standalone mode.
>
> b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode.
>
> 2. Master mode switch will happen only if the corresponding command executed successfully.
>
> 3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiry of
>
> wal_sender_timeout, the master will switch from sync mode to standalone mode.
>
> Please provide your opinion or any other expectation out of this patch.

I'm going to say right off the bat that I think the whole notion to
automatically disable synchronous replication when the standby goes down
is completely bonkers. If you don't need the strong guarantee that your
transaction is safe in at least two servers before it's acknowledged to
the client, there's no point enabling synchronous replication in the
first place. If you do need it, then you shouldn't fall back to a
degraded mode, at least not automatically. It's an idea that keeps
coming back, but I have not heard a convincing argument why it makes
sense. It's been discussed many times before, most recently in that
thread you linked to.

Now that I got that out of the way, I concur that some sort of hooks or
commands that fire when a standby goes down or comes back up makes
sense, for monitoring purposes. I don't much like this particular
design. If you just want to write log entry, when all the standbys are
disconnected, running a shell command seems like an awkward interface.
It's OK for raising an alarm, but there are many other situations where
you might want to raise alarms, so I'd rather have us implement some
sort of a generic trap system, instead of adding this one particular
extra config option. What do people usually use to monitor replication?

There are two things we're trying to solve here: raising an alarm when
something interesting happens, and changing the configuration to
temporarily disable synchronous replication. What would be a good API to
disable synchronous replication? Editing the config file and SIGHUPing
is not very nice. There's been talk of an ALTER command to change the
config, but I'm not sure that's a very good API either. Perhaps expose
the sync_master_in_standalone_mode variable you have in your patch to
new SQL-callable functions. Something like:

pg_disable_synchronous_replication()
pg_enable_synchronous_replication()

I'm not sure where that state would be stored. Should it persist
restarts? And you probably should get some sort of warnings in the log
when synchronous replication is disabled.

In summary, more work is required to design a good
user/admin/programming interface. Let's hear a solid proposal for that,
before writing patches.

BTW, calling an external command with system(), while holding
SyncRepLock in exclusive-mode, seems like a bad idea. For starters,
holding a lock will prevent a new WAL sender from starting up and
becoming a synchronous standby, and the external command might take a
long time to return.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Raiskup 2014-01-08 09:27:34 Re: pg_upgrade: make the locale comparison more tolerating
Previous Message Craig Ringer 2014-01-08 09:02:48 Re: WIP patch (v2) for updatable security barrier views