Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-11-16 16:09:57
Message-ID: CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 13, 2015 at 12:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg(at)mail(dot)gmail(dot)com>
>> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> ...
>> >> This patch is a tool for discussion, so I'm not going to fix this bug
>> >> until getting consensus.
>> >>
>> >> We are still under the discussion to find solution that can get consensus.
>> >> I felt that it's difficult to select from the two approaches within
>> >> this development cycle, and there would not be time to implement such
>> >> big feature even if we selected.
>> >> But this feature is obviously needed by many users.
>> >> So I'm considering more simple and extensible something solution, the
>> >> idea I posted is one of them.
>> >> The another worth considering approach is that just specifying the
>> >> number of sync standby. It also can cover the main use cases in
>> >> some-cases.
>> >
>> > Yes, it covers main and simple use case like "I want to have multiple
>> > synchronous replicas!". Even if we miss quorum commit at the first
>> > version, the feature is still very useful.
>
> +1
>
>> It can cover not only the case you mentioned but also main use case
>> Michael mentioned by setting same application_name.
>> And that first version patch is almost implemented, so just needs to
>> be reviewed.
>>
>> I think that it would be good to implement the simple feature at the
>> first version, and then coordinate the design based on opinion and
>> feed backs from more user, use-case.
>
> Yeah. I agree with it. And I have two proposals in this
> direction.
>
> - Notation
>
> synchronous_standby_names, and synchronous_replication_method as
> a variable to provide other syntax is probably no argument
> except its name. But I feel synchronous_standby_num looks bit
> too specific.
>
> I'd like to propose if this doesn't reprise the argument on
> notation for replication definitions:p
>
> The following two GUCs would be enough to bear future expansion
> of notation syntax and/or method.
>
> synchronous_standby_names : as it is
>
> synchronous_replication_method:
>
> default is "1-priority", which means the same with the current
> meaning. possible additional values so far would be,
>
> "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
> where n is the number of required acknowledges.

One question is that what is different between the leading "n" in
s_s_names and the leading "n" of "n-priority"?

>
> "n-quorum": the format of s_s_names is the same as above, but
> it is read in quorum context.
>
> These can be expanded, for example, as follows, but in future.
>
> "complex" : Michael's format.
> "json" : JSON?
> "json-ext": specify JSON in external file.
>
> Even after we have complex notations, I suppose that many use
> cases are coverd by the first tree notations.

I'm not sure it's desirable to implement the all kind of methods into core.
I think it's better to extend replication in order to be more
extensibility like adding hook function.
And then other approach is implemented as a contrib module.

>
> - Internal design
>
> What should be done in SyncRepReleaseWaiters() is calculating a
> pair of LSNs that can be regarded as synced and decide whether
> *this* walsender have advanced the LSN pair, then trying to
> release backends that wait for the LSNs *if* this walsender has
> advanced them.
>
> From such point, the proposed patch will make redundant trials
> to release backens.
>
> Addition to that, the patch looks to be a mixture of the current
> implement and the new feature. These are for the same objective
> so they cannot coexist each other, I think. As the result, codes
> for both quorum/priority judgement appear at multiple level in
> call tree. This would be an obstacle for future (possible)
> expansion.
>
> So, I think this feature should be implemented as following,
>
> SyncRepInitConfig reads the configuration and stores the result
> structure into elsewhere such like WalSnd->syncrepset_definition
> instead of WalSnd->sync_standby_priority, which should be
> removed. Nothing would be stored if the current wal sender is
> not a member of the defined replication set. Storing a pointer
> to matching function there would increase the flexibility but
> such implement in contrast will make the code difficult to be
> read.. (I often look for the entity of xlogreader->read_page()
> ;)
>
> Then SyncRepSyncedLsnAdvancedTo() instead of
> SyncRepGetSynchronousStandbys() returns an LSN pair that can be
> regarded as 'synced' according to specified definition of
> replication set and whether this walsender have advanced the
> LSNs.
>
> Finally, SyncRepReleaseWaiters() uses it to release backends if
> needed.
>
> The differences among quorum/priority or others are confined in
> SyncRepSyncedLsnAdvancedTo(). As the result,
> SyncRepReleaseWaiters would look as following.
>
> | SyncRepReleaseWaiters(void)
> | {
> | if (MyWalSnd->syncrepset_definition == NULL || ...)
> | return;
> | ...
> | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
> | {
> | /* I haven't advanced the synced LSNs */
> | LWLockRelease(SyncRepLock);
> | rerturn;
> | }
> | /* Set the lsn first so that when we wake backends they will relase...
>
> I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
> does but perhaps yes we can:p in effective manner..
>
> What do you think about this?

I agree with this design.
What SyncRepSyncedLsnAdvancedTo() does would be different for each
method, so we can implement "n-priority" style multiple sync
replication at first version.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Catalin Iacob 2015-11-16 16:16:25 Re: proposal: multiple psql option -c
Previous Message Alvaro Herrera 2015-11-16 15:19:23 Re: Making tab-complete.c easier to maintain