Re: Synchronization levels in SR

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronization levels in SR
Date: 2010-05-26 13:37:47
Message-ID: 1274881067.6203.3024.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:

> > To summarise, I think we can get away with just 3 parameters:
> > synchronous_replication = N # similar in name to synchronous_commit
> > synch_rep_timeout = T
> > synch_rep_timeout_action = commit | abort
>
> I agree to add the latter two parameters, which are also listed on
> my outline of SynchRep.
> http://wiki.postgresql.org/wiki/Streaming_Replication#Synchronization_capability
>
> > Conceptually, this is "I want at least N replica copies made of my
> > database changes, I will wait for up to T milliseconds to get that
> > otherwise I will do X". Very easy and clear for an application to
> > understand what guarantees it is requesting. Also very easy for the
> > administrator to understand the guarantees requested and how to
> > provision for them: to deliver robustness they typically need N+1
> > servers, or for even higher levels of robustness and performance N+2
> > etc..
>
> I don't feel that "synchronous_replication" approach is intuitive for
> the administrator. Even on this thread, some people seem to prefer
> "per-standby" setting.

Maybe they do, but that is because nobody has yet explained how you
would handle failure modes with per-standby settings. When you do they
will likely change their minds. Put the whole story on the table before
trying to force a decision.

> Without "per-standby" setting, when there are two standbys, one is in
> the near rack and another is in remote site, "synchronous_replication=1"
> cannot guarantee that the near standby is always synch with the master.
> So when the master goes down, unfortunately we might have to failover to
> the remote standby.

If the remote server responded first, then that proves it is a better
candidate for failover than the one you think of as near. If the two
standbys vary over time then you have network problems that will
directly affect the performance on the master; synch_rep = N would
respond better to any such problems.

> OTOH, "synchronous_replication=2" degrades the
> performance on the master very much.

Yes, but only because you have only one near standby. It would clearly
to be foolish to make this setting without 2+ near standbys. We would
then have 4 or more servers; how do we specify everything for that
config??

> "synchronous_replication" approach
> doesn't seem to cover the typical use case.

You described the failure modes for the quorum proposal, but avoided
describing the failure modes for the "per-standby" proposal.

Please explain what will happen when the near server is unavailable,
with per-standby settings. Please also explain what will happen if we
choose to have 4 or 5 servers to maintain performance in case of the
near server going down. How will we specify the failure modes?

> Also, when "synchronous_replication=1" and one of synchronous standbys
> goes down, how should the surviving standby catch up with the master?
> Such standby might be too far behind the master. The transaction commit
> should wait for the ACK from the lagging standby immediately even if
> there might be large gap? If yes, "synch_rep_timeout" would screw up
> the replication easily.

That depends upon whether we send the ACK at point #2, #3 or #4. It
would only cause a problem if you waited until #4.

I've explained why I have made the proposals I've done so far: reduced
complexity in failure modes and better user control. To understand that
better, you or somebody needs to explain how we would handle the failure
modes with "per-standby" settings so we can compare.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2010-05-26 13:42:02 Re: ExecutorCheckPerms() hook
Previous Message Giles Lean 2010-05-26 13:34:14 Re: libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT