Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-11-17 09:13:11
Message-ID: CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 17, 2015 at 9:57 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ(at)mail(dot)gmail(dot)com>
>> > - Notation
>> >
>> > synchronous_standby_names, and synchronous_replication_method as
>> > a variable to provide other syntax is probably no argument
>> > except its name. But I feel synchronous_standby_num looks bit
>> > too specific.
>> >
>> > I'd like to propose if this doesn't reprise the argument on
>> > notation for replication definitions:p
>> >
>> > The following two GUCs would be enough to bear future expansion
>> > of notation syntax and/or method.
>> >
>> > synchronous_standby_names : as it is
>> >
>> > synchronous_replication_method:
>> >
>> > default is "1-priority", which means the same with the current
>> > meaning. possible additional values so far would be,
>> >
>> > "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
>> > where n is the number of required acknowledges.
>>
>> One question is that what is different between the leading "n" in
>> s_s_names and the leading "n" of "n-priority"?
>
> Ah. Sorry for the ambiguous description. 'n' in s_s_names
> representing an arbitrary integer number and that in "n-priority"
> is literally an "n", meaning "a format with any number of
> priority hosts" as a whole. As an instance,
>
> synchronous_replication_method = "n-priority"
> synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>
> I added "n-" of "n-priority" to distinguish with "1-priority" so
> if we won't provide "1-priority" for backward compatibility,
> "priority" would be enough to represent the type.
>
> By the way, s_r_method is not essentially necessary but it would
> be important to avoid complexity of autodetection of formats
> including currently undefined ones.

Than you for your explanation, I understood that.

It means that the format of s_s_names will be changed, which would be not good.
So, how about the adding just s_r_method parameter and the number of
required ACK is represented in the leading of s_r_method?
For example, the following setting is same as above.

synchronous_replication_method = "2-priority"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

In quorum method, we can set;
synchronous_replication_method = "2-quorum"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

Thought?

>
>
>> > "n-quorum": the format of s_s_names is the same as above, but
>> > it is read in quorum context.
>
> The "n" of this is the same as above.
>
>> > These can be expanded, for example, as follows, but in future.
>> >
>> > "complex" : Michael's format.
>> > "json" : JSON?
>> > "json-ext": specify JSON in external file.
>> >
>> > Even after we have complex notations, I suppose that many use
>> > cases are coverd by the first tree notations.
>>
>> I'm not sure it's desirable to implement the all kind of methods into core.
>> I think it's better to extend replication in order to be more
>> extensibility like adding hook function.
>> And then other approach is implemented as a contrib module.
>
> I agree with you. I proposed the following internal design having
> that in mind.
>
>> > - Internal design
>> >
>> > What should be done in SyncRepReleaseWaiters() is calculating a
>> > pair of LSNs that can be regarded as synced and decide whether
>> > *this* walsender have advanced the LSN pair, then trying to
>> > release backends that wait for the LSNs *if* this walsender has
>> > advanced them.
>> >
>> > From such point, the proposed patch will make redundant trials
>> > to release backens.
>> >
>> > Addition to that, the patch looks to be a mixture of the current
>> > implement and the new feature. These are for the same objective
>> > so they cannot coexist each other, I think. As the result, codes
>> > for both quorum/priority judgement appear at multiple level in
>> > call tree. This would be an obstacle for future (possible)
>> > expansion.
>> >
>> > So, I think this feature should be implemented as following,
>> >
>> > SyncRepInitConfig reads the configuration and stores the result
>> > structure into elsewhere such like WalSnd->syncrepset_definition
>> > instead of WalSnd->sync_standby_priority, which should be
>> > removed. Nothing would be stored if the current wal sender is
>> > not a member of the defined replication set. Storing a pointer
>> > to matching function there would increase the flexibility but
>> > such implement in contrast will make the code difficult to be
>> > read.. (I often look for the entity of xlogreader->read_page()
>> > ;)
>> >
>> > Then SyncRepSyncedLsnAdvancedTo() instead of
>> > SyncRepGetSynchronousStandbys() returns an LSN pair that can be
>> > regarded as 'synced' according to specified definition of
>> > replication set and whether this walsender have advanced the
>> > LSNs.
>> >
>> > Finally, SyncRepReleaseWaiters() uses it to release backends if
>> > needed.
>> >
>> > The differences among quorum/priority or others are confined in
>> > SyncRepSyncedLsnAdvancedTo(). As the result,
>> > SyncRepReleaseWaiters would look as following.
>> >
>> > | SyncRepReleaseWaiters(void)
>> > | {
>> > | if (MyWalSnd->syncrepset_definition == NULL || ...)
>> > | return;
>> > | ...
>> > | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
>> > | {
>> > | /* I haven't advanced the synced LSNs */
>> > | LWLockRelease(SyncRepLock);
>> > | rerturn;
>> > | }
>> > | /* Set the lsn first so that when we wake backends they will relase...
>> >
>> > I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
>> > does but perhaps yes we can:p in effective manner..
>> >
>> > What do you think about this?
>>
>> I agree with this design.
>> What SyncRepSyncedLsnAdvancedTo() does would be different for each
>> method, so we can implement "n-priority" style multiple sync
>> replication at first version.
>
> Maybe the first *additional* one if we decide to keep backward
> compatibility, as the discussion above.
>

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-11-17 09:15:35 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message konstantin knizhnik 2015-11-17 09:09:09 Re: Question concerning XTM (eXtensible Transaction Manager API)