Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-12-11 17:03:09
Message-ID: CAD21AoApriUVxUvtUGvt9fgMo=fxYkk8iv8McccwGRBsxprStA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Dec 9, 2015 at 8:59 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Oops.
>>>
>>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>>>> Hello,
>>>>
>>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
>>>> > >> One question is that what is different between the leading "n" in
>>>> > >> s_s_names and the leading "n" of "n-priority"?
>>>> > >
>>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>>> > > representing an arbitrary integer number and that in "n-priority"
>>>> > > is literally an "n", meaning "a format with any number of
>>>> > > priority hosts" as a whole. As an instance,
>>>> > >
>>>> > > synchronous_replication_method = "n-priority"
>>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>>> > >
>>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>>> > > if we won't provide "1-priority" for backward compatibility,
>>>> > > "priority" would be enough to represent the type.
>>>> > >
>>>> > > By the way, s_r_method is not essentially necessary but it would
>>>> > > be important to avoid complexity of autodetection of formats
>>>> > > including currently undefined ones.
>>>> >
>>>> > Than you for your explanation, I understood that.
>>>> >
>>>> > It means that the format of s_s_names will be changed, which would be not good.
>>>>
>>>> I believe that the format of definition of "replication set"(?)
>>>> is not fixed and it would be more complex format to support
>>>> nested definition. This should be in very different format from
>>>> the current simple list of names. This is a selection among three
>>>> or possiblly more disigns in order to be tolerable for future
>>>> changes, I suppose.
>>>>
>>>> 1. Additional formats of definition in future will be stored in
>>>> elsewhere of s_s_names.
>>>>
>>>> 2. Additional format will be stored in s_s_names, the format will
>>>> be automatically detected.
>>>>
>>>> 3. (ditto), the format is designated by s_r_method.
>>>>
>>>> 4. Any other way?
>>>>
>>>> I choosed the third way. What do you think about future expansion
>>>> of the format?
>>>>
>>
>> I agree with #3 way and the s_s_name format you suggested.
>> I think that It's extensible and is tolerable for future changes.
>> I'm going to implement the patch based on this idea if other hackers
>> agree with this design.
>>
>
> Please find the attached draft patch which supports multi sync replication.
> This patch adds a GUC parameter synchronous_replication_method, which
> represent the method of synchronous replication.
>
> [Design of replication method]
> synchronous_replication_method has two values; 'priority' and
> '1-priority' for now.
> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
>
> * s_r_method = '1-priority'
> This method is for backward compatibility, so the syntax of s_s_names
> is same as today.
> The behavior is same as well.
>
> * s_r_method = 'priority'
> This method is for multiple synchronous replication using priority method.
> The syntax of s_s_names is,
> <number of sync standbys>, <standby name> [, ...]
>
> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
> node3' means that the master waits for acknowledge from at least 2
> lowest priority servers.
> If 4 standbys(node1 - node4) are available, the master server waits
> acknowledge from 'node1' and 'node2.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node2 | sync
> node3 | potential
> node4 | async
> (4 rows)
>
> After 'node2' crashed, the master will wait for acknowledge from
> 'node1' and 'node3'.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node3 | sync
> node4 | async
> (3 rows)
>
> [Changing replication method]
> When we want to change the replication method, we have to change the
> s_r_method at first, and then do pg_reload_conf().
> After changing replication method, we can change the s_s_names.
>
> [Expanding replication method]
> If we want to expand new replication method additionally, we need to
> implement two functions for each replication method:
> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
> This function obtains the list of standbys considered as synchronous
> at that time, and return its length.
> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
> This function obtains LSNs(write, flush) considered as synced.
>
> Also, this patch debug code is remain yet, you can debug this behavior
> using by enable DEBUG_REPLICATION macro.
>
> Please give me feedbacks.
>

I've attached updated patch.
Please give me feedbacks.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_multi_sync_replication_v2.patch application/octet-stream 20.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-11 17:07:57 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Masahiko Sawada 2015-12-11 16:56:50 Re: Tab-comletion for RLS