Re: Quorum commit for multiple synchronous replication.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-11-08 13:12:56
Message-ID: CAB7nPqR8NTwzMx6H1sUy=Hp9ErgQZzB4W6Lhk125DnzNL_ZqMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> + return SyncRepGetSyncStandbysPriority(am_sync);
>> + else /* SYNC_REP_QUORUM */
>> + return SyncRepGetSyncStandbysQuorum(am_sync);
>> Both routines share the same logic to detect if a WAL sender can be
>> selected as a candidate for sync evaluation or not, still per the
>> selection they do I agree that it is better to keep them as separate.
>>
>> + /* In quroum method, all sync standby priorities are always 1 */
>> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> + priority = 1;
>> Honestly I don't understand why you are enforcing that. Priority can
>> be important for users willing to switch from ANY to FIRST to have a
>> look immediately at what are the standbys that would become sync or
>> potential.
>
> I thought that since all standbys appearing in s_s_names list are
> treated equally in quorum method, these standbys should have same
> priority.
> If these standby have different sync_priority, it looks like that
> master server replicates to standby server based on priority.

No actually, because we know that they are a quorum set, and that they
work in the same set. The concept of priorities has no real meaning
for quorum as there is no ordering of the elements. Another, perhaps
cleaner idea may be to mark the field as NULL actually.

>> It is not possible to guess from how many standbys this needs to wait
>> for. One idea would be to mark the sync_state not as "quorum", but
>> "quorum-N", or just add a new column to indicate how many in the set
>> need to give a commit confirmation.
>
> As Simon suggested before, we could support another feature that
> allows the client to control the quorum number.
> Considering adding that feature, I thought it's better to have and
> control that information as a GUC parameter.
> Thought?

Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
is not that much legitimate, users could just look at s_s_names to
guess how many in hte set a commit needs to wait for.

+ <para>
+ <literal>FIRST</> and <literal>ANY</> are case-insensitive word
+ and the standby name having these words are must be double-quoted.
+ </para>
s/word/words/.

+ <literal>FIRST</> and <literal>ANY</> specify the method of
+ that how master server controls the standby servers.
A little bit hard to understand, I would suggest:
FIRST and ANY specify the method used by the master to control the
standby servers.

+ The keyword <literal>FIRST</>, coupled with an integer
+ number N higher-priority standbys and makes transaction commit
+ when their WAL records are received.
This is unclear to me. Here is a correction:
The keyword FIRST, coupled with an integer N, makes transaction commit
wait until WAL records are received fron the N standbys with higher
priority number.

+ <varname>synchronous_standby_names</>. For example, a setting
+ of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits
+ wait until receiving receipts from at least any three standbys
+ of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>,
This could just mention WAL records instead of "receipts".

Instead of saying "an integer number N", we could use <literal>num_sync</>.

+ If <literal>FIRST</> or <literal>ANY</> are not specified,
this parameter
+ behaves as <literal>ANY</>. Note that this grammar is
incompatible with
+ <productname>PostgresSQL</> 9.6, where no keyword specified
is equivalent
+ as if <literal>FIRST</> was specified. In short, there is no
real need to
+ specify <replaceable class="parameter">num_sync</replaceable> as this
+ behavior does not have changed, as well as it is not
necessary to mention
+ pre-9.6 versions are the multi-sync grammar has been added in 9.6.
This paragraph could be reworked, say:
if FIRST or ANY are not specified this parameter behaves as if ANY is
used. Note that this grammar is incompatible with PostgreSQL 9.6 which
is the first version supporting multiple standbys with synchronous
replication, where no such keyword FIRST or ANY can be used. Note that
the grammar behaves as if FIRST is used, which is incompatible with
the post-9.6 behavior.

+ <entry>Synchronous state of this standby server. <literal>quorum-N</>
+ , where N is the number of synchronous standbys that transactions
+ need to wait for replies from, when standby is considered as a
+ candidate of quorum commit.</entry>
Nitpicking: I think that the comma goes to the previous line if it is
the first character of a line.

+ if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
+ return SyncRepGetSyncStandbysPriority(am_sync);
+ else /* SYNC_REP_QUORUM */
+ return SyncRepGetSyncStandbysQuorum(am_sync)
Or that?
if (PRIORITY)
return StandbysPriority();
else if (QUORUM)
return StandbysQuorum();
else
elog(ERROR, "Boom");

+ * In priority method, we need the oldest these positions among sync
+ * standbys. In quorum method, we need the newest these positions
+ * specified by SyncRepConfig->num_sync.
Last sentence is grammatically incorrect, and it would be more correct
to precise the Nth LSN positions to be able to select k standbys from
a set of n ones.

+ SpinLockAcquire(&walsnd->mutex);
+ write_array[i] = walsnd->write;
+ flush_array[i]= walsnd->flush;
+ apply_array[i] = walsnd->flush;
+ SpinLockRelease(&walsnd->mutex);
A nit: adding a space on the self of the second = character. And you
need to save the apply position of the WAL sender, not the flush
position in the array that is going to be ordered.

/*
* More easily understood version of standby state. This is purely
- * informational, not different from priority.
+ * informational. In quorum method, we add the number to indicate
+ * how many in the set need to give a commit confirmation.
*/
if (priority == 0)
values[7] = CStringGetTextDatum("async");
else if (list_member_int(sync_standbys, i))
- values[7] = CStringGetTextDatum("sync");
+ values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
+ CStringGetTextDatum("sync") : CStringGetTextDatum("quorum")
This code block and its explanation comments tell two different
stories. The comment is saying that something like "quorum-N" is used
but the code always prints "quorum".

It may be a good idea in the test to check that when no keywords is
specified the group of standbys is in quorum mode.

The code looks in good shape, I am still willing to run more advanced
tests manually.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-11-08 13:15:29 Re: [PATCH] Reload SSL certificates on SIGHUP
Previous Message Daniel Gustafsson 2016-11-08 13:06:29 Re: Radix tree for character conversion