Re: Synchronization levels in SR

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronization levels in SR
Date: 2010-05-26 16:55:30
Message-ID: 4BFD5282.7030501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26/05/10 18:31, Robert Haas wrote:
> And frankly, I don't think it's possible for quorum commit to reduce
> the number of parameters. Even if we have that feature available, not
> everyone will want to use it. And the people who don't will
> presumably need whatever parameters they would have needed if quorum
> commit hadn't been available in the first place.

Agreed, quorum commit is not a panacea.

For example, suppose that you have two servers, master and a standby,
and you want transactions to be synchronously committed to both, so that
in the event of a meteor striking the master, you don't lose any
transactions that have been replied to the client as committed.

Now you want to set up a temporary replica of the master at a
development server, for testing purposes. If you set quorum to 2, your
development server becomes critical infrastructure, which is not what
you want. If you set quorum to 1, it also becomes critical
infrastructure, because it's possible that a transaction has been
replicated to the test server but not the real production standby, and a
meteor strikes.

Per-standby settings would let you express that, but not OTOH the quorum
behavior where you require N out of M to acknowledge the commit before
returning to client.

There's really no limit to how complex a setup can be. For example,
imagine that you have two data centers, with two servers in each. You
want to replicate the master to all four servers, but for commit to
return to the client, it's enough that the transaction has been
replicated to one server in each data center. How do you express that in
the config file? And it would be nice to have per-transaction control
too, like with synchronous_commit...

So this is a tradeoff between
* flexibility, how complex a setup you can express?
* code complexity, how complicated is it to implement?
* user-friendliness, how easy is it to configure?

One way out of this is to implement something very simple in PostgreSQL,
and build external WAL proxying tools in pgfoundry that allow you to
cascade and disseminate the WAL in as complex scenarios as you want.

>> Your reply has again avoided the subject of how we would handle failure
>> modes with per-standby settings. That is important.
>
> I don't think anyone is avoiding that, we just haven't discussed it.
> The thing is, I don't think quorum commit actually does anything to
> address that problem. If I have a master and a standby configured for
> sync rep and the standby goes down, we have to decide what impact that
> has on the master. If I have a master and two standbys configured for
> sync rep with quorum commit such that I only need an ack from one of
> them, and they both go down, we still have to decide what impact that
> has on the master. I agree we need to talk about, but I don't agree
> that putting in quorum commit will remove the need to design that
> case.

Right, failure modes need to be discussed, but how quorum commit or
whatnot is configured is irrelevant to that.

No-one has come up with a scheme on how to abort a transaction if you
don't get a reply from a synchronous standby (or all standbys or a
quorum of standbys). Until someone does, a commit on the master will
have to always succeed. The "synchronous" aspect will provide a
guarantee that if a standby is connected, any transaction in the master
will become visible (or fsync'd or just streamed to, depending on the
level) on the standby too before it's acknowledged as committed to the
client, nothing more, nothing less.

One way to do that would be to refrain from flushing the commit record
to disk on the master until the standby has acknowledged it. The
downside is that the master is in a very severe state at that point:
until you flush the WAL, you can buffer only a small amount WAL traffic
until you run out of wal_buffers, stalling all write activity in the
master, with backends waiting. You can't even shut down the server
cleanly. But if you value your transaction integrity much higher than
availability, maybe that's what you want.

PS. I whole-heartedly agree with Simon's concern upthread that if we
allow a standby to specify in its config file that it wants to be a
synchronous standby, that's a bit dangerous because connecting such a
standby to the master will suddenly make all commits on the master a lot
slower. Adding a synchronous standby should require some action in the
master, since it affects the behavior on master.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-05-26 17:10:22 Re: Synchronization levels in SR
Previous Message Peter Eisentraut 2010-05-26 16:41:36 Re: [PATCH] Add XMLEXISTS function from the SQL/XML standard