Re: Quorum commit for multiple synchronous replication.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-10-17 07:00:06
Message-ID: CAD21AoAs+V1iVK9ki75gw6Z0cm2O+TKcznuSrwJd4GJajNGJww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
>>> quorum commit.
>>> That is,
>>> 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs
>>> from k standby servers whose name appear earlier in the list.
>>> 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs
>>> from any k listed standby servers.
>>> 3. 'n1, n2, n3' is the same as #1 with k=1.
>>> 4. '(n1, n2, n3)' is the same as #2 with k=1.
>>
>> OK, so I have done a review of this patch keeping that in mind as
>> that's the consensus. I am still getting familiar with the code...
>
> Thank you for reviewing!
>
>> - transactions will wait until all the standby servers which are considered
>> + transactions will wait until the multiple standby servers which
>> are considered
>> There is no real need to update this sentence.
>>
>> + <literal>FIRST</> means to control the standby servers with
>> + different priorities. The synchronous standbys will be those
>> + whose name appear earlier in this list, and that are both
>> + currently connected and streaming data in real-time(as shown
>> + by a state of <literal>streaming</> in the
>> + <link linkend="monitoring-stats-views-table">
>> + <literal>pg_stat_replication</></link> view). Other standby
>> + servers appearing later in this list represent potential
>> + synchronous standbys. If any of the current synchronous
>> + standbys disconnects for whatever reason, it will be replaced
>> + immediately with the next-highest-priority standby.
>> + For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
>> + makes transaction commits wait until their WAL records are received
>> + by three higher-priority standbys chosen from standby servers
>> + <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
>> It does not seem necessary to me to enter in this level of details:
>> The keyword FIRST, coupled with an integer number N, chooses the first
>> N higher-priority standbys and makes transaction commit when their WAL
>> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
>> makes transaction commits wait until their WAL records are received by
>> the three high-priority standbys chosen from standby servers s1, s2,
>> s3 and s4.
>
> Will fix.
>
>> + <literal>ANY</> means to control all of standby servers with
>> + same priority. The master sever will wait for receipt from
>> + at least <replaceable class="parameter">num_sync</replaceable>
>> + standbys, which is quorum commit in the literature. The all of
>> + listed standbys are considered as candidate of quorum commit.
>> + For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
>> + transaction commits wait until receiving receipts from at least
>> + any three standbys of four listed servers <literal>s1</>,
>> + <literal>s2</>, <literal>s3</>, <literal>s4</>.
>>
>> Similarly, something like that...
>> The keyword ANY, coupled with an integer number N, chooses N standbys
>> in a set of standbys with the same, lowest, priority and makes
>> transaction commit when WAL records are received those N standbys. For
>> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
>> records have been received from 3 servers in the set s1, s2, s3 and
>> s4.
>
> Will fix.
>
>> It could be good also to mention that no keyword specified means ANY,
>> which is incompatible with 9.6. The docs also miss the fact that if a
>> simple list of servers is given, without parenthesis and keywords,
>> this is equivalent to FIRST 1.
>
> Right. I will add those documentations.
>
>> -synchronous_standby_names = '2 (s1, s2, s3)'
>> +synchronous_standby_names = 'First 2 (s1, s2, s3)'
>> Nit here. It may be a good idea to just use upper-case characters in
>> the docs, or just lower-case for consistency, but not mix both.
>> Usually GUCs use lower-case characters.
>
> Agree. Will fix.
>
>> + when standby is considered as a condidate of quorum commit.</entry>
>> s/condidate/candidate/
>
> Will fix.
>
>> -syncrep_scanner.c: FLEXFLAGS = -CF -p
>> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i
>> Hm... Is that actually a good idea? Now "NODE" and "node" are two
>> different things for application_name, but with this patch both would
>> have the same meaning. I am getting to think that we could just use
>> the lower-case characters for the keywords any/first. Is this -i
>> switch a problem for elements in standby_list?
>
> The string of standby name is not changed actually, only the parser
> doesn't distinguish between "NODE" and "node".
> The values used for checking application_name will still works fine.
> If we want to name "first" or "any" as the standby name then it should
> be double quoted.
>
>> + * Calculate the 'pos' newest Write, Flush and Apply positions among
>> sync standbys.
>> I don't understand this comment.
>>
>> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> + got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
>> + &applyPtr, &am_sync);
>> + else /* SYNC_REP_QUORUM */
>> + got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
>> + &applyPtr,
>> SyncRepConfig->num_sync,
>> + &am_sync);
>> Those could be grouped together, there is no need to have pos as an argument.
>
> Will fix.
>
>> + /* In quroum method, all sync standby priorities are always 1 */
>> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> + priority = 1;
>> This is dead code, SyncRepGetSyncStandbysPriority is not called for
>> QUORUM.
>
> Well, this code is in SyncRepGetStandbyPriority which is called by
> SyncRepInitConifig.
> SyncRepGetStandbyPriority can be called regardless of the the
> synchronization method.
>
>
>> You may want to add an assert in
>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
>> sure that they are getting called for the correct method.
>> + /* Sort each array in descending order to get 'pos' newest element */
>> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> There is no need to reorder things again and to use arrays, you can
>> choose the newest LSNs when scanning the WalSnd entries.
>
> I considered it that but it depends on performance.
> Current patch avoids O(N*M).
>

Attached latest patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
000_quorum_commit_v4.patch binary/octet-stream 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2016-10-17 07:14:42 Re: FSM corruption leading to errors
Previous Message Heikki Linnakangas 2016-10-17 06:42:16 Re: When is TopMemoryContext released ?