Re: Quorum commit for multiple synchronous replication.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-10-11 07:18:01
Message-ID: CAD21AoDvp4y1HFZgUr_Gsompx_0Uudj__+S0xTu73QKJ1zYNCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
>> quorum commit.
>> That is,
>> 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs
>> from k standby servers whose name appear earlier in the list.
>> 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs
>> from any k listed standby servers.
>> 3. 'n1, n2, n3' is the same as #1 with k=1.
>> 4. '(n1, n2, n3)' is the same as #2 with k=1.
>
> OK, so I have done a review of this patch keeping that in mind as
> that's the consensus. I am still getting familiar with the code...

Thank you for reviewing!

> - transactions will wait until all the standby servers which are considered
> + transactions will wait until the multiple standby servers which
> are considered
> There is no real need to update this sentence.
>
> + <literal>FIRST</> means to control the standby servers with
> + different priorities. The synchronous standbys will be those
> + whose name appear earlier in this list, and that are both
> + currently connected and streaming data in real-time(as shown
> + by a state of <literal>streaming</> in the
> + <link linkend="monitoring-stats-views-table">
> + <literal>pg_stat_replication</></link> view). Other standby
> + servers appearing later in this list represent potential
> + synchronous standbys. If any of the current synchronous
> + standbys disconnects for whatever reason, it will be replaced
> + immediately with the next-highest-priority standby.
> + For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
> + makes transaction commits wait until their WAL records are received
> + by three higher-priority standbys chosen from standby servers
> + <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
> It does not seem necessary to me to enter in this level of details:
> The keyword FIRST, coupled with an integer number N, chooses the first
> N higher-priority standbys and makes transaction commit when their WAL
> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
> makes transaction commits wait until their WAL records are received by
> the three high-priority standbys chosen from standby servers s1, s2,
> s3 and s4.

Will fix.

> + <literal>ANY</> means to control all of standby servers with
> + same priority. The master sever will wait for receipt from
> + at least <replaceable class="parameter">num_sync</replaceable>
> + standbys, which is quorum commit in the literature. The all of
> + listed standbys are considered as candidate of quorum commit.
> + For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
> + transaction commits wait until receiving receipts from at least
> + any three standbys of four listed servers <literal>s1</>,
> + <literal>s2</>, <literal>s3</>, <literal>s4</>.
>
> Similarly, something like that...
> The keyword ANY, coupled with an integer number N, chooses N standbys
> in a set of standbys with the same, lowest, priority and makes
> transaction commit when WAL records are received those N standbys. For
> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
> records have been received from 3 servers in the set s1, s2, s3 and
> s4.

Will fix.

> It could be good also to mention that no keyword specified means ANY,
> which is incompatible with 9.6. The docs also miss the fact that if a
> simple list of servers is given, without parenthesis and keywords,
> this is equivalent to FIRST 1.

Right. I will add those documentations.

> -synchronous_standby_names = '2 (s1, s2, s3)'
> +synchronous_standby_names = 'First 2 (s1, s2, s3)'
> Nit here. It may be a good idea to just use upper-case characters in
> the docs, or just lower-case for consistency, but not mix both.
> Usually GUCs use lower-case characters.

Agree. Will fix.

> + when standby is considered as a condidate of quorum commit.</entry>
> s/condidate/candidate/

Will fix.

> -syncrep_scanner.c: FLEXFLAGS = -CF -p
> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i
> Hm... Is that actually a good idea? Now "NODE" and "node" are two
> different things for application_name, but with this patch both would
> have the same meaning. I am getting to think that we could just use
> the lower-case characters for the keywords any/first. Is this -i
> switch a problem for elements in standby_list?

The string of standby name is not changed actually, only the parser
doesn't distinguish between "NODE" and "node".
The values used for checking application_name will still works fine.
If we want to name "first" or "any" as the standby name then it should
be double quoted.

> + * Calculate the 'pos' newest Write, Flush and Apply positions among
> sync standbys.
> I don't understand this comment.
>
> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> + got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
> + &applyPtr, &am_sync);
> + else /* SYNC_REP_QUORUM */
> + got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
> + &applyPtr,
> SyncRepConfig->num_sync,
> + &am_sync);
> Those could be grouped together, there is no need to have pos as an argument.

Will fix.

> + /* In quroum method, all sync standby priorities are always 1 */
> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> + priority = 1;
> This is dead code, SyncRepGetSyncStandbysPriority is not called for
> QUORUM.

Well, this code is in SyncRepGetStandbyPriority which is called by
SyncRepInitConifig.
SyncRepGetStandbyPriority can be called regardless of the the
synchronization method.

> You may want to add an assert in
> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
> sure that they are getting called for the correct method.
> + /* Sort each array in descending order to get 'pos' newest element */
> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
> There is no need to reorder things again and to use arrays, you can
> choose the newest LSNs when scanning the WalSnd entries.

I considered it that but it depends on performance.
Current patch avoids O(N*M).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2016-10-11 07:32:52 Re: proposal: psql \setfileref
Previous Message Torsten Zuehlsdorff 2016-10-11 07:08:21 Re: kqueue