Re: Review of Synchronous Replication patches

From: zb(at)cybertec(dot)at
To: "Yeb Havinga" <yebhavinga(at)gmail(dot)com>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, hs(at)cybertec(dot)at
Subject: Re: Review of Synchronous Replication patches
Date: 2010-07-24 07:40:17
Message-ID: 45f2f88591b7bc77ba03839929736061.squirrel@internal.cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> ... Off the list I've received word from Zoltan
> that work on a new patch is planned. It would be great if ideas from
> both patches could be merged into one.

...

> * Does it follow SQL spec, or the community-agreed behavior?
> A: Unknown, though the choices in guc parameters suggest the patch's
> been made to support actual use cases.
> B: Design of patch B has been discussed on the mailing list as well as
> the wiki (for links I refer to my previous mail)

> * Have all the bases been covered?
> Patch A seems to cover two-phase commit where patch B does not. (I'm
> time constrained and currently do not have a replicated cluster with
> patch B anymore, so I cannot test).

...

> Differences:
> Patch A synchronizes by sending back the Xids that have been received
> and written to disk. When the master receives the xids, the respective
> backends with having those xids active are unlocked and signalled to
> continue. This means some locking of the procarray. The libpq protocol
> was adapted so a client (the walreceiver) can report back the xids.
> Patch B synchronizes by waiting to send wal data, until the sync
> replicas have sending back LSN pointers in the wal files they have
> synced to, in one of three ways. The libpq protocol was adapted so the
> walreceiver can report back WAL file positions.
>
> Perhaps the weakness of both patches is that they are not one. In my
> opinion, both patches have their strengths. It would be great if these
> strenght could be unified in a single patch.
>
> patch A strengths:
> * design of the guc parameters - meaning of
> min_sync_replication_clients. As I understand it, it is not possible
> with patch B to define a minimum number of synchronous standby servers a
> transaction must be replicated to, before the commit succeeds. Perhaps
> the name could be changed (quorum_min_sync_standbys?), but in my opinion
> the definitive sync replication patch needs a parameter with this number
> and exact meaning. The 'synchronous_slave' guc in boolean is also a good
> one in my opinion.
>
> patch B strengths:
> * having the walsender synced by waiting for acked LSN, instead of
> signalling respective backends with a specific XID active, seems like a
> simpler and therefore better solution.
> * the three different ways when to sync (recv,flush and replay), and
> also that this is configurable per standby.
>
> In patch B there's probably some work to do in WaitXLogSend(), after a
> long day I'm having troubles understanding it's meaning but somehow I
> believe that a more intuitive set of guc parameters can be found,
> accompanied by clear code in this function. The way patch A synchronizes
> and count synchronous standbys for a commit is clearer.

This week I am on vacation but we discussed your review a little with
Hans-Jürgen Schönig and given the opinion and the consensus that sending
back LSNs is a better solution, I will withdraw my patch from the
commitfest page.

Instead, I will post a patch that unifies my configuration choices with
Fujii's patch. Do you have suggestions for better worded GUCs?
"slave" seems to be phased out by "standby" for political correctness, so
"synchronous_standby" instead of "synchronous_slave". You mentioned
"min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?

You also noticed that my patch addressed 2PC, maybe I will have to add
this part to Fujii's patch, too. Note: I haven't yet read his patch,
maybe working with LSNs instead of XIDs make this work automatically,
I don't know. We should definitely test.

Best regards,
Zoltán Böszörményi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-07-24 09:44:11 Re: Rewrite, normal execution vs. EXPLAIN ANALYZE
Previous Message Yeb Havinga 2010-07-24 06:56:18 Review of Synchronous Replication patches