On 01/09/10 10:53, Fujii Masao wrote:
> Before discussing about that, we should determine whether registering
> standbys in master is really required. It affects configuration a lot.
> Heikki thinks that it's required, but I'm still unclear about why and
> Why do standbys need to be registered in master? What information
> should be registered?
That requirement falls out from the handling of disconnected standbys.
If a standby is not connected, what does the master do with commits? If
the answer is anything else than acknowledge them to the client
immediately, as if the standby never existed, the master needs to know
what standby servers exist. Otherwise it can't know if all the standbys
are connected or not.
>> What does synchronous replication mean, when is a transaction
>> acknowledged as committed?
> I proposed four synchronization levels:
> 1. async
> doesn't make transaction commit wait for replication, i.e.,
> asynchronous replication. This mode has been already supported in
> 2. recv
> makes transaction commit wait until the standby has received WAL
> 3. fsync
> makes transaction commit wait until the standby has received and
> flushed WAL records to disk
> 4. replay
> makes transaction commit wait until the standby has replayed WAL
> records after receiving and flushing them to disk
> OTOH, Simon proposed the quorum commit feature. I think that both
> is required for various our use cases. Thought?
I'd like to keep this as simple as possible, yet flexible so that with
enough scripting and extensions, you can get all sorts of behavior. I
think quorum commit falls into the "extension" category; if you're setup
is complex enough, it's going to be impossible to represent that in our
config files no matter what. But if you write a little proxy, you can
implement arbitrary rules there.
I think recv/fsync/replay should be specified in the standby. It has no
direct effect on the master, the master would just relay the setting to
the standby when it connects, or the standby would send multiple
XLogRecPtrs and let the master decide when the WAL is persistent enough.
And what if you write a proxy that has some other meaning of "persistent
enough"? Like when it has been written to the OS buffers but not yet
fsync'd, or when it has been fsync'd to at least one standby and
received by at least three others. recv/fsync/replay is not going to
represent that behavior well.
"sync vs async" on the other hand should be specified in the master,
because it has a direct impact on the behavior of commits in the master.
I propose a configuration file standbys.conf, in the master:
# STANDBY NAME SYNCHRONOUS TIMEOUT
importantreplica yes 100ms
tempcopy no 10s
Or perhaps this should be stored in a system catalog.
>> What to do if a standby server dies and never
>> acknowledges a commit?
> The master's reaction to that situation should be configurable. So
> I'd propose new configuration parameter specifying the reaction.
> Valid values are:
> - standalone
> When the master has waited for the ACK much longer than the timeout
> (or detected the failure of the standby), it closes the connection
> to the standby and restarts transactions.
> - down
> When that situation occurs, the master shuts down immediately.
> Though this is unsafe for the system requiring high availability,
> as far as I recall, some people wanted this mode in the previous
Yeah, though of course you might want to set that per-standby too..
Let's step back a bit and ask what would be the simplest thing that you
could call "synchronous replication" in good conscience, and also be
useful at least to some people. Let's leave out the "down" mode, because
that requires registration. We'll probably have to do registration at
some point, but let's take as small steps as possible.
Without the "down" mode in the master, frankly I don't see the point of
the "recv" and "fsync" levels in the standby. Either way, when the
master acknowledges a commit to the client, you don't know if it has
made it to the standby yet because the replication connection might be
down for some reason.
That leaves us the 'replay' mode, which *is* useful, because it gives
you the guarantee that when the master acknowledges a commit, it will
appear committed in all hot standby servers that are currently
connected. With that guarantee you can build a reliable cluster with
something pgpool-II where all writes go to one node, and reads are
distributed to multiple nodes.
I'm not sure what we should aim for in the first phase. But if you want
as little code as possible yet have something useful, I think 'replay'
mode with no standby registration is the way to go.
In response to
pgsql-hackers by date
|Next:||From: Magnus Hagander||Date: 2010-09-01 10:39:35|
|Subject: Re: git: uh-oh|
|Previous:||From: Itagaki Takahiro||Date: 2010-09-01 09:51:16|
|Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch|