Re: Quorum commit for multiple synchronous replication.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-11-14 08:39:36
Message-ID: CAD21AoBcD-LfaX-Fi_zSLBQ1anUo-0xCpJ=AK6xF1EtDcJ_1Mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>>> + return SyncRepGetSyncStandbysPriority(am_sync);
>>> + else /* SYNC_REP_QUORUM */
>>> + return SyncRepGetSyncStandbysQuorum(am_sync);
>>> Both routines share the same logic to detect if a WAL sender can be
>>> selected as a candidate for sync evaluation or not, still per the
>>> selection they do I agree that it is better to keep them as separate.
>>>
>>> + /* In quroum method, all sync standby priorities are always 1 */
>>> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>> + priority = 1;
>>> Honestly I don't understand why you are enforcing that. Priority can
>>> be important for users willing to switch from ANY to FIRST to have a
>>> look immediately at what are the standbys that would become sync or
>>> potential.
>>
>> I thought that since all standbys appearing in s_s_names list are
>> treated equally in quorum method, these standbys should have same
>> priority.
>> If these standby have different sync_priority, it looks like that
>> master server replicates to standby server based on priority.
>
> No actually, because we know that they are a quorum set, and that they
> work in the same set. The concept of priorities has no real meaning
> for quorum as there is no ordering of the elements. Another, perhaps
> cleaner idea may be to mark the field as NULL actually.

We know that but I'm concerned it might confuse the user.
If these priorities are the same, it can obviously imply that all of
the standby listed in s_s_names are handled equally.

>>> It is not possible to guess from how many standbys this needs to wait
>>> for. One idea would be to mark the sync_state not as "quorum", but
>>> "quorum-N", or just add a new column to indicate how many in the set
>>> need to give a commit confirmation.
>>
>> As Simon suggested before, we could support another feature that
>> allows the client to control the quorum number.
>> Considering adding that feature, I thought it's better to have and
>> control that information as a GUC parameter.
>> Thought?
>
> Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
> is not that much legitimate, users could just look at s_s_names to
> guess how many in hte set a commit needs to wait for.

It would be PGC_USRSET similar to synchronous_commit. The user can
specify it in statement level.

> + <para>
> + <literal>FIRST</> and <literal>ANY</> are case-insensitive word
> + and the standby name having these words are must be double-quoted.
> + </para>
> s/word/words/.
>
> + <literal>FIRST</> and <literal>ANY</> specify the method of
> + that how master server controls the standby servers.
> A little bit hard to understand, I would suggest:
> FIRST and ANY specify the method used by the master to control the
> standby servers.
>
> + The keyword <literal>FIRST</>, coupled with an integer
> + number N higher-priority standbys and makes transaction commit
> + when their WAL records are received.
> This is unclear to me. Here is a correction:
> The keyword FIRST, coupled with an integer N, makes transaction commit
> wait until WAL records are received fron the N standbys with higher
> priority number.
>
> + <varname>synchronous_standby_names</>. For example, a setting
> + of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits
> + wait until receiving receipts from at least any three standbys
> + of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>,
> This could just mention WAL records instead of "receipts".
>
> Instead of saying "an integer number N", we could use <literal>num_sync</>.
>
> + If <literal>FIRST</> or <literal>ANY</> are not specified,
> this parameter
> + behaves as <literal>ANY</>. Note that this grammar is
> incompatible with
> + <productname>PostgresSQL</> 9.6, where no keyword specified
> is equivalent
> + as if <literal>FIRST</> was specified. In short, there is no
> real need to
> + specify <replaceable class="parameter">num_sync</replaceable> as this
> + behavior does not have changed, as well as it is not
> necessary to mention
> + pre-9.6 versions are the multi-sync grammar has been added in 9.6.
> This paragraph could be reworked, say:
> if FIRST or ANY are not specified this parameter behaves as if ANY is
> used. Note that this grammar is incompatible with PostgreSQL 9.6 which
> is the first version supporting multiple standbys with synchronous
> replication, where no such keyword FIRST or ANY can be used. Note that
> the grammar behaves as if FIRST is used, which is incompatible with
> the post-9.6 behavior.
>
> + <entry>Synchronous state of this standby server. <literal>quorum-N</>
> + , where N is the number of synchronous standbys that transactions
> + need to wait for replies from, when standby is considered as a
> + candidate of quorum commit.</entry>
> Nitpicking: I think that the comma goes to the previous line if it is
> the first character of a line.
>
> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> + return SyncRepGetSyncStandbysPriority(am_sync);
> + else /* SYNC_REP_QUORUM */
> + return SyncRepGetSyncStandbysQuorum(am_sync)
> Or that?
> if (PRIORITY)
> return StandbysPriority();
> else if (QUORUM)
> return StandbysQuorum();
> else
> elog(ERROR, "Boom");
>
> + * In priority method, we need the oldest these positions among sync
> + * standbys. In quorum method, we need the newest these positions
> + * specified by SyncRepConfig->num_sync.
> Last sentence is grammatically incorrect, and it would be more correct
> to precise the Nth LSN positions to be able to select k standbys from
> a set of n ones.
>
> + SpinLockAcquire(&walsnd->mutex);
> + write_array[i] = walsnd->write;
> + flush_array[i]= walsnd->flush;
> + apply_array[i] = walsnd->flush;
> + SpinLockRelease(&walsnd->mutex);
> A nit: adding a space on the self of the second = character. And you
> need to save the apply position of the WAL sender, not the flush
> position in the array that is going to be ordered.
>
> /*
> * More easily understood version of standby state. This is purely
> - * informational, not different from priority.
> + * informational. In quorum method, we add the number to indicate
> + * how many in the set need to give a commit confirmation.
> */
> if (priority == 0)
> values[7] = CStringGetTextDatum("async");
> else if (list_member_int(sync_standbys, i))
> - values[7] = CStringGetTextDatum("sync");
> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum")
> This code block and its explanation comments tell two different
> stories. The comment is saying that something like "quorum-N" is used
> but the code always prints "quorum".
>
> It may be a good idea in the test to check that when no keywords is
> specified the group of standbys is in quorum mode.

Yeah, I will add some tests.

I will post new version patch incorporated other comments.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-11-14 08:52:04 Re: [PATCH] Allow TAP tests to be run individually
Previous Message Brad DeJong 2016-11-14 08:17:51 Re: WIP: Covering + unique indexes