Re: Quorum commit for multiple synchronous replication.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2016-12-06 09:57:38
Message-ID: CAD21AoA8ptbF75cnmQnaqQwyFj+JkxVN0vbtaKF+dujBrsL=Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> Attached latest version patch incorporated review comments. After more
>>>> thought, I agree and changed the value of standby priority in quorum
>>>> method so that it's not set 1 forcibly. The all standby priorities are
>>>> 1 If s_s_names = 'ANY(*)'.
>>>> Please review this patch.
>>>
>>> Sorry for my late reply. Here is my final lookup.
>>
>> Thank you for reviewing!
>>
>>> <synopsis>
>>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
>>> class="parameter">standby_name</replaceable> [, ...] )
>>> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>> +FIRST <replaceable class="parameter">num_sync</replaceable> (
>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>> <replaceable class="parameter">standby_name</replaceable> [, ...
>>> This can just be replaced with [ ANY | FIRST ]. There is no need for
>>> braces as the keyword is not mandatory.
>>>
>>> + is the name of a standby server.
>>> + <literal>FIRST</> and <literal>ANY</> specify the method used by
>>> + the master to control the standby servres.
>>> </para>
>>> s/servres/servers/.
>>>
>>> if (priority == 0)
>>> values[7] = CStringGetTextDatum("async");
>>> else if (list_member_int(sync_standbys, i))
>>> - values[7] = CStringGetTextDatum("sync");
>>> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>>> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>>> else
>>> values[7] = CStringGetTextDatum("potential");
>>> This can be simplified a bit as "quorum" is the state value for all
>>> standbys with a non-zero priority when the method is set to
>>> SYNC_REP_QUORUM:
>>> if (priority == 0)
>>> values[7] = CStringGetTextDatum("async");
>>> + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>> + values[7] = CStringGetTextDatum("quorum");
>>> else if (list_member_int(sync_standbys, i))
>>> values[7] = CStringGetTextDatum("sync");
>>> else
>>>
>>> SyncRepConfig data is made external to syncrep.c with this patch as
>>> walsender.c needs to look at the sync method in place, no complain
>>> about that after considering if there could be a more elegant way to
>>> do things without this change.
>>
>> Agreed.
>>
>>> While reviewing the patch, I have found a couple of incorrectly shaped
>>> sentences, both in the docs and some comments. Attached is a new
>>> version with this word-smithing. The patch is now switched as ready
>>> for committer.
>>
>> Thanks. I found a typo in v7 patch, so attached latest v8 patch.
>
> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>
> In quorum commit, we need to calculate the N-th largest LSN from
> M quorum synchronous standbys' LSN. N would be usually 1 - 3 and
> M would be 1 - 10, I guess. You used the algorithm using qsort for
> that calculation. But I'm not sure if that's enough effective algorithm
> or not.
>
> If M (i.e., number of quorum sync standbys) is enough large,
> your choice would be good. But usually M seems not so large.
>

Thank you for the comment.

One another possible idea is to use the partial selection sort[1],
which takes O(MN) time. Since this is more efficient if N is small
this would be better than qsort for this case. But I'm not sure that
we can see such a difference by result of performance measurement.

[1] https://en.wikipedia.org/wiki/Selection_algorithm#Partial_selection_sort

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-12-06 10:14:31 Re: Quorum commit for multiple synchronous replication.
Previous Message Michael Paquier 2016-12-06 09:11:27 Re: pg_dump, pg_dumpall and data durability