Re: Quorum commit for multiple synchronous replication.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-11-07 15:25:24
Message-ID: CAD21AoC9ZgYEJpw+75AJPM_NUbfqmSeG8q2EMP5ubKHM9dJ-zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Attached latest patch.
>> Please review it.
>
> Okay, so let's move on with this patch...

Thank you for reviewing this patch.

> + <para>
> + The keyword <literal>ANY</> is omissible, but note that there is
> + not compatibility between <productname>PostgreSQL</> version 10 and
> + 9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the
> + configuration with <literal>FIRST</> and <replaceable
> class="parameter">
> + num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6
> + or before. On the other hand, It's the same as the configuration with
> + <literal>ANY</> and <replaceable
> class="parameter">num_sync</> equal to
> + 1 in <productname>PostgreSQL</> 10 or later.
> + </para>
> This paragraph could be reworded:
> If FIRST or ANY are not specified, this parameter behaves as ANY. Note
> that this grammar is incompatible with PostgreSQL 9.6, where no
> keyword specified is equivalent as if FIRST was specified.
> In short, there is no real need to specify num_sync as this behavior
> does not have changed, as well as it is not necessary to mention
> pre-9.6 versions as the multi-sync grammar has been added in 9.6.

Fixed.

> - Specifying more than one standby name can allow very high availability.
> Why removing this sentence?
>
> + The keyword <literal>ANY</>, coupeld with an interger number N,
> s/coupeld/coupled/ and s/interger/integer/, for a double hit in one
> line, still...
>
> + The keyword <literal>ANY</>, coupeld with an interger number N,
> + chooses N standbys in a set of standbys with the same, lowest,
> + priority and makes transaction commit when WAL records are received
> + those N standbys.
> This could be reworded more simply, for example: The keyword ANY,
> coupled with an integer number N, makes transaction commits wait until
> WAL records are received from N connected standbys among those defined
> in the list of synchronous_standby_names.
>
> + <literal>s2</> and <literal>s3</> wil be considered as synchronous standby
> s/wil/will/
>
> + when standby is considered as a condidate of quorum commit.</entry>
> s/condidate/candidate/
>
> [... stopping here ...] Please be more careful with the documentation
> and comment grammar. There are other things in the patch..

I fix some typo as much as I found.

> A bunch of comments at the top of syncrep.c need to be updated.
>
> +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync);
> +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync);
> Those two should be static routines in syncrep.c, let's keep the whole
> logic about quorum and higher-priority definition only there and not
> bother the callers of them about that.

Fixed.

> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> + return SyncRepGetSyncStandbysPriority(am_sync);
> + else /* SYNC_REP_QUORUM */
> + return SyncRepGetSyncStandbysQuorum(am_sync);
> Both routines share the same logic to detect if a WAL sender can be
> selected as a candidate for sync evaluation or not, still per the
> selection they do I agree that it is better to keep them as separate.
>
> + /* In quroum method, all sync standby priorities are always 1 */
> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> + priority = 1;
> Honestly I don't understand why you are enforcing that. Priority can
> be important for users willing to switch from ANY to FIRST to have a
> look immediately at what are the standbys that would become sync or
> potential.

I thought that since all standbys appearing in s_s_names list are
treated equally in quorum method, these standbys should have same
priority.
If these standby have different sync_priority, it looks like that
master server replicates to standby server based on priority.

> else if (list_member_int(sync_standbys, i))
> - values[7] = CStringGetTextDatum("sync");
> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
> The comment at the top of this code block needs to be refreshed.

Fixed.

> The representation given to the user in pg_stat_replication is not
> enough IMO. For example, imagine a cluster with 4 standbys:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 0 | async
> node_5434 | 0 | async
> node_5435 | 0 | async
> node_5436 | 0 | async
> (4 rows)
>
> If FIRST N is used, is it easy for the user to understand what are the
> nodes in sync:
> =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433,
> node_5434, node_5435)';
> ALTER SYSTEM
> =# select pg_reload_conf();
> pg_reload_conf
> ----------------
> t
> (1 row)
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 1 | sync
> node_5434 | 2 | sync
> node_5435 | 3 | potential
> node_5436 | 0 | async
> (4 rows)
>
> In this case it is easy to understand that two nodes are required to be in sync.
>
> When using ANY similarly for three nodes, here is what
> pg_stat_replication tells:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 1 | quorum
> node_5434 | 1 | quorum
> node_5435 | 1 | quorum
> node_5436 | 0 | async
> (4 rows)
>
> It is not possible to guess from how many standbys this needs to wait
> for. One idea would be to mark the sync_state not as "quorum", but
> "quorum-N", or just add a new column to indicate how many in the set
> need to give a commit confirmation.

As Simon suggested before, we could support another feature that
allows the client to control the quorum number.
Considering adding that feature, I thought it's better to have and
control that information as a GUC parameter.
Thought?

Attached latest v5 patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
000_quorum_commit_v5.patch text/x-diff 27.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-07 15:29:15 Re: Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Previous Message Peter Eisentraut 2016-11-07 15:13:48 Re: add more NLS to bin