| From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> | 
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> | 
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, fazool mein <fazoolmein(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Synchronous replication - patch status inquiry | 
| Date: | 2010-09-01 10:23:56 | 
| Message-ID: | 4C7E29BC.3020902@enterprisedb.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 01/09/10 10:53, Fujii Masao wrote:
> Before discussing about that, we should determine whether registering
> standbys in master is really required. It affects configuration a lot.
> Heikki thinks that it's required, but I'm still unclear about why and
> how.
>
> Why do standbys need to be registered in master? What information
> should be registered?
That requirement falls out from the handling of disconnected standbys. 
If a standby is not connected, what does the master do with commits? If 
the answer is anything else than acknowledge them to the client 
immediately, as if the standby never existed, the master needs to know 
what standby servers exist. Otherwise it can't know if all the standbys 
are connected or not.
>> What does synchronous replication mean, when is a transaction
>> acknowledged as committed?
>
> I proposed four synchronization levels:
>
> 1. async
>    doesn't make transaction commit wait for replication, i.e.,
>    asynchronous replication. This mode has been already supported in
>    9.0.
>
> 2. recv
>    makes transaction commit wait until the standby has received WAL
>    records.
>
> 3. fsync
>    makes transaction commit wait until the standby has received and
>    flushed WAL records to disk
>
> 4. replay
>    makes transaction commit wait until the standby has replayed WAL
>    records after receiving and flushing them to disk
>
> OTOH, Simon proposed the quorum commit feature. I think that both
> is required for various our use cases. Thought?
I'd like to keep this as simple as possible, yet flexible so that with 
enough scripting and extensions, you can get all sorts of behavior. I 
think quorum commit falls into the "extension" category; if you're setup 
is complex enough, it's going to be impossible to represent that in our 
config files no matter what. But if you write a little proxy, you can 
implement arbitrary rules there.
I think recv/fsync/replay should be specified in the standby. It has no 
direct effect on the master, the master would just relay the setting to 
the standby when it connects, or the standby would send multiple 
XLogRecPtrs and let the master decide when the WAL is persistent enough. 
And what if you write a proxy that has some other meaning of "persistent 
enough"? Like when it has been written to the OS buffers but not yet 
fsync'd, or when it has been fsync'd to at least one standby and 
received by at least three others. recv/fsync/replay is not going to 
represent that behavior well.
"sync vs async" on the other hand should be specified in the master, 
because it has a direct impact on the behavior of commits in the master.
I propose a configuration file standbys.conf, in the master:
# STANDBY NAME    SYNCHRONOUS   TIMEOUT
importantreplica  yes           100ms
tempcopy          no            10s
Or perhaps this should be stored in a system catalog.
>> What to do if a standby server dies and never
>> acknowledges a commit?
>
> The master's reaction to that situation should be configurable. So
> I'd propose new configuration parameter specifying the reaction.
> Valid values are:
>
> - standalone
>    When the master has waited for the ACK much longer than the timeout
>    (or detected the failure of the standby), it closes the connection
>    to the standby and restarts transactions.
>
> - down
>    When that situation occurs, the master shuts down immediately.
>    Though this is unsafe for the system requiring high availability,
>    as far as I recall, some people wanted this mode in the previous
>    discussion.
Yeah, though of course you might want to set that per-standby too..
Let's step back a bit and ask what would be the simplest thing that you 
could call "synchronous replication" in good conscience, and also be 
useful at least to some people. Let's leave out the "down" mode, because 
that requires registration. We'll probably have to do registration at 
some point, but let's take as small steps as possible.
Without the "down" mode in the master, frankly I don't see the point of 
the "recv" and "fsync" levels in the standby. Either way, when the 
master acknowledges a commit to the client, you don't know if it has 
made it to the standby yet because the replication connection might be 
down for some reason.
That leaves us the 'replay' mode, which *is* useful, because it gives 
you the guarantee that when the master acknowledges a commit, it will 
appear committed in all hot standby servers that are currently 
connected. With that guarantee you can build a reliable cluster with 
something pgpool-II where all writes go to one node, and reads are 
distributed to multiple nodes.
I'm not sure what we should aim for in the first phase. But if you want 
as little code as possible yet have something useful, I think 'replay' 
mode with no standby registration is the way to go.
-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Magnus Hagander | 2010-09-01 10:39:35 | Re: git: uh-oh | 
| Previous Message | Itagaki Takahiro | 2010-09-01 09:51:16 | Re: I: About "Our CLUSTER implementation is pessimal" patch |