Re: Sync Rep Design

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: greg(at)2ndQuadrant(dot)com, Josh Berkus <josh(at)postgresql(dot)org>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2010-12-31 12:40:58
Message-ID: 4D1DCF5A.7070808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31.12.2010 13:48, Simon Riggs wrote:
> On Fri, 2010-12-31 at 12:06 +0200, Heikki Linnakangas wrote:
>
>> Regarding the rest of the proposal, I would still prefer the UI
>> discussed here:
>>
>> http://archives.postgresql.org/message-id/4CAE030A.2060701@enterprisedb.com
>>
>> It ought to be the same amount of work to implement, and provides the
>> same feature set, but makes administration a bit easier by being able to
>> name the standbys. Also, I dislike the idea of having the standby
>> specify that it's a synchronous standby that the master has to wait for.
>> Behavior on the master should be configured on the master.
>
> Good point; I've added the people on the copy list from that post. This
> question is they key, so please respond after careful thought on my
> points below.
>
> There are ways to blend together the two approaches, discussed later,
> though first we need to look at the reasons behind my proposals.
>
> I see significant real-world issues with configuring replication using
> multiple named servers, as described in the link above:

All of these points only apply to specifying *multiple* named servers in
the synchronous_standbys='...' list. That's certainly a more complicated
scenario, and the configuration is more complicated as a result. With
your proposal, it's not possible in the first place.

Multiple synchronous standbys probably isn't needed by most people, so
I'm fine with leaving that out for now, keeping the design the same
otherwise. I included it in the proposal because it easily falls out of
the design. So, if you're worried about the complexities of multiple
synchronous standbys, let's keep the UI exactly the same as what I
described in the link above, but only allow one name in the
synchronous_standbys setting, instead of a list.

> 3. Administrative complexity just jumped a huge amount.
>
> (a) If you add or remove servers to the config you need to respecify all
> the parameters, which need to be specific to the exact set of servers.

Hmm, this could be alleviated by allowing the master to have a name too.
All the configs could then be identical, except for the unique name for
each server. For example, for a configuration with three servers that
are all synchronous with each other, each server would have
"synchronous_standbys='server1, server2, server3'" in the config file.
The master would simply ignore the entry for itself.

> (b) After failover, the list of synchronous_standbys needs to be
> re-specified, yet what is the correct list of servers? The only way to
> make that config work is with complex middleware that automatically
> generates new config files.

It depends on what you want. I think you're envisioning that the
original server is taken out of the system and not waited for, meaning
that you accept a lower level of persistence after failover. Yes, then
you need to change the config. Or more likely you prepare the config
file in the standby that way to begin with.

> I don't think that is "the same amount of
> work to implement", its an order of magnitude harder overall.

I meant it's the same amount of work to implement the feature in
PostgreSQL. No doubt that maintaining such a setup in production is more
complicated.

> 5. Requesting sync from more than one server performs poorly, since you
> must wait for additional servers. If there are sporadic or systemic
> network performance issues you will be badly hit by them. Monitoring
> that just got harder also. First-response-wins is more robust in the
> case of volatile resources since it implies responsiveness to changing
> conditions.
>
> 6. You just lost the ability to control performance on the master, with
> a userset. Performance is a huge issue with sync rep. If you can't
> control it, you'll simply turn it off. Having a feature that we daren't
> ever use because it performs poorly helps nobody. This is not a tick-box
> in our marketing checklist, I want it to be genuinely real-world usable.

You could make synchronous_standbys a user-settable GUC, just like your
proposed boolean switch. You could then control on a per-transaction
basis which servers you want to wait to respond. Although perhaps it
would be more user-friendly to just have an additional boolean GUC,
similar to synchronous_commit=on/off. Or maybe synchronous_commit is
enough to control that.

> I suppose we might regard the feature set I am proposing as being the
> same as making synchronous_standbys a USERSET parameter, and allowing
> just two options:
> "none" - allowing the user to specify async if they wish it
> "*" - allowing people to specify that syncing to *any* standby is
> acceptable
>
> We can blend the two approaches together, if we wish, by having two
> parameters (plus server naming)
> synchronous_replication = on | off (USERSET)
> synchronous_standbys = '...'
> If synchronous_standbys is not set and synchronous_replication = on then
> we sync to any standby. If synchronous_replication = off then we use
> async replication, whatever synchronous_standbys is set to.
> If synchronous_standbys is set, then we use sync rep to all listed
> servers.

Sounds good.

I still don't like the synchronous_standbys='' and
synchronous_replication=on combination, though. IMHO that still amounts
to letting the standby control the behavior on master, and it makes it
impossible to temporarily add an asynchronous standby to the mix. I
could live with it, you wouldn't be forced to use it that way after all,
but I would still prefer to throw an error on that combination. Or at
least document the pitfalls and recommend always naming the standbys.

> My proposal amounts to "lets add synchronous_standbys as a parameter in
> 9.2". If you really think that we need that functionality in this
> release, lets get the basic stuff added now and then fold in those ideas
> on top afterwards. If we do that, I will help. However, my only
> insistence is that we explain the above points very clearly in the docs
> to specifically dissuade people from using those features for typical
> cases.

Huh, wait, if you leave out synchronous_standbys, that's a completely
different UI again. I think we've finally reached agreement on how this
should be configured, let's stick to that, please.

(I would be fine with limiting synchronous_standbys to just one server
in this release though.)

> If you wondered why I ignored your post previously, its because I
> understood that Fujii's post of 15 Oct, one week later, effectively
> accepted my approach, albeit with two additional parameters. That is the
> UI that I had been following.
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php

That thread makes no mention of how to specify which standbys are
synchronous and which are not. It's about specifying the timeout and
whether to wait for a disconnected standby. Yeah, Fujii-san's proposal
seems reasonable for configuring that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-12-31 12:56:08 Re: Old git repo
Previous Message Robert Haas 2010-12-31 12:40:15 Re: Sync Rep Design