Re: Synchronization levels in SR

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronization levels in SR
Date: 2010-05-27 09:01:04
Message-ID: 87ljb5iur3.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 26/05/10 23:31, Dimitri Fontaine wrote:
>> So if you want simplicity to admin, effective data availability and
>> precise control over the global setup, I say go for:
>> a. transaction level control of the replication level
>> b. cascading support
>> c. quorum with timeout
>> d. choice of commit or rollback at timeout
>>
>> Then give me a setup example that you can't express fully.
>
> One master, one synchronous standby on another continent for HA purposes,
> and one asynchronous reporting server in the same rack as the master. You
> don't want to set up the reporting server as a cascaded slave of the standby
> on the other continent, because that would double the bandwidth required,
> but you also don't want the master to wait for the reporting server.
>
> The possibilities are endless... Your proposal above covers a pretty good
> set of scenarios, but it's by no means complete. If we try to solve
> everything the configuration will need to be written in a Turing-complete
> Replication Description Language. We'll have to pick a useful,
> easy-to-understand subset that covers the common scenarios. To handle the
> more exotic scenarios, you can write a proxy that sits in front of the
> master, and implements whatever rules you wish, with the rules written
> in C.

Agreed on the Turing-completeness side of those things. My current
thinking is that the proxy I want might simply be a PostgreSQL instance
with cascading support. In your example that would give us:

Remote Standby, HA
Master -- Proxy -<
Local Standby, Reporting

So what I think we have here is a pretty good trade-off in terms of what
you can do with some simple setup knobs. What's left there is that with
the quorum idea, you're not sure if the one server that's synced is the
remote or local standby, in this example. Several ideas are floating
around (votes, mixed per-standby and per-transaction settings).

Maybe we could have the standby be able to say it's not interesting into
participating into the quorum, that is, it's an async replica, full
stop.

In your example we'd set the local reporting standby as a non-voting
member of the replication setting, the proxy and the master would have a
quorum of 1, and the remote HA standby would vote.

I don't think the idea of having any number of voting coupons other than
0 or 1 on any server will help us the least.

I do think that your proxy idea is a great one and should be in core. By
the way, the cascading/proxy instance could be set without Hot Standby,
if you don't like to be able to monitor it via a libpq connection and
some queries.

> BTW, I think we're going to need a separate config file for listing the
> standbys anyway. There you can write per-server rules and options, but
> explicitly knowing about all the standbys also allows the master to recycle
> WAL as soon as it has been streamed to all the registered
> standbys. Currently we just keep wal_keep_segments files around, just in
> case there's a standby out there that needs them.

I much prefer that each server in the set publish what it wants. It only
connects to 1 given provider. Then we've been talking about this exact
same retention problem for queueing solutions, with Jan, Marko and Jim.

The idea we came up with is a watermarking solution (which already
exists in Skytools 3, in its coarse-grain version). The first approach
is to have all slave give back to its local master/provider/origin the
last replayed WAL/LSN, once in a while. You derive from that a global
watermark and drop WAL files depending on it.

You now have two problems: no more space and why keeping that many files
on the master anyway, maybe some slave could be set up for retention
instead?

To solve that it's possible for each server to be setup with a
restricted set of servers they're deriving their watermark from. That's
when you need per-server options and an explicit list of all the
standbys whatever their level in the cascading tree. That means explicit
maintenance of the entire replication topology.

I don't think we need to solve that already. I think we need to provide
an option on each member of the replication tree to either PANIC or lose
WALs in case they're running out of space when trying to follow the
watermark. It's crude but already allows to have a standby set to
maintain the common archive and have the master drop the WAL files as
soon as possible (respecting wal_keep_segments).

In our case, if a WAL file is no more available from any active server
we still have the option to fetch it from the archives...

Regards,
--
Dimitri Fontaine
PostgreSQL DBA, Architecte

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-05-27 09:12:36 Re: Synchronization levels in SR
Previous Message Pavel Stehule 2010-05-27 08:29:22 Re: functional call named notation clashes with SQL feature