Re: Synchronous Log Shipping Replication

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: Synchronous Log Shipping Replication
Date: 2008-09-10 13:36:22
Message-ID: 20080910133622.GJ704@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Simon Riggs <simon(at)2ndQuadrant(dot)com> [080910 06:18]:

> We have a number of choices, at the point of failure:
> * Does the whole primary server stay up (probably)?

The only sane choice is the one the admin makes. Any "predetermined" choice
PG makes can (and will) be wrong in some situations.

> * Do we continue to allow new transactions in degraded mode? (which
> increases the risk of transaction loss if we continue at that time).
> (The answer sounds like it will be "of course, stupid" but this cluster
> may be part of an even higher level HA mechanism, so the answer isn't
> always clear).

The only sane choice is the one the admin makes. Any "predetermined" choice
PG makes can (and will) be wrong in some situations.

> * For each transaction that is trying to commit: do we want to wait
> forever? If not, how long? If we stop waiting, do we throw ERROR, or do
> we say, lets get on with another transaction.

The only sane choice is the one the admin makes. Any "predetermined" choice
PG makes can (and will) be wrong in some situations.

> If the server is up, yet all connections in a session pool are stuck
> waiting for their last commits to complete then most sysadmins would
> agree that the server is actually "down". Since no useful work is
> happening, or can be initiated - even read only. We don't need to
> address that issue in the same way for all transactions, is all I'm
> saying.

Sorry to sound like a broken record here, but the whole point is to
guarantee data safety. You can only start trading ACID for HA if you
have the ACID guarantees in the first place (and for replication, this
means across the cluster, including slaves)

So in that light, I think it's pretty obvious that if a slave is
considered part of an active synchronous replication cluster, in the
face of "network lag", or even network failure, the master *must* pretty
much halt all new commits in their tracks until that slave acknowledges
the commit. Yes that's going to cause a backup. That's the cost of a
synchronous replication.

But that means the admin has to be able to control whether a slave is
part of an active synchronous replication cluster or not. I hope that
control eventually is a lot more than a GUC that says "when a slave is X
seconds behind, abandon him).

I'ld dream of a "replication" interface where I could add new slaves on
the fly (and a nice tool that pg_start_backup()/sync/apply WAL to sync
then subscribe), get slave status (maybe syncing/active/abandoned), and
some average latency (i.e. something like svctm of iostat on your WAL
disk) and some way to control the slave degradation from active to
abandoned (like the above GUC, or maybe a callout/hook/script that runs
when latency > X, etc, or both).

And for async replication, you just have a "proxy" slave which does
nothing but subscribe to your master, always acknowledge WAL right away
so the master doesn't wait, and keep a local backlog of WAL it's
sending out to many clients. This proxy slave doesn't slow down the
master, but can feed clients accross slow WAN links (that may not have
the burst bandwidth to keep up with bursty master writes, but have agregate
bandwidth to keep pretty close to the master), or networks that drop out
for a period, etc.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-09-10 13:38:04 Re: Keeping creation time of objects
Previous Message Kenneth Marshall 2008-09-10 13:04:01 Re: hash index improving v3