Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-12-09 15:29:20
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Oops.
>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>>> Hello,
>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
>>> > >> One question is that what is different between the leading "n" in
>>> > >> s_s_names and the leading "n" of "n-priority"?
>>> > >
>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>> > > representing an arbitrary integer number and that in "n-priority"
>>> > > is literally an "n", meaning "a format with any number of
>>> > > priority hosts" as a whole. As an instance,
>>> > >
>>> > > synchronous_replication_method = "n-priority"
>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>> > >
>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>> > > if we won't provide "1-priority" for backward compatibility,
>>> > > "priority" would be enough to represent the type.
>>> > >
>>> > > By the way, s_r_method is not essentially necessary but it would
>>> > > be important to avoid complexity of autodetection of formats
>>> > > including currently undefined ones.
>>> >
>>> > Than you for your explanation, I understood that.
>>> >
>>> > It means that the format of s_s_names will be changed, which would be not good.
>>> I believe that the format of definition of "replication set"(?)
>>> is not fixed and it would be more complex format to support
>>> nested definition. This should be in very different format from
>>> the current simple list of names. This is a selection among three
>>> or possiblly more disigns in order to be tolerable for future
>>> changes, I suppose.
>>> 1. Additional formats of definition in future will be stored in
>>> elsewhere of s_s_names.
>>> 2. Additional format will be stored in s_s_names, the format will
>>> be automatically detected.
>>> 3. (ditto), the format is designated by s_r_method.
>>> 4. Any other way?
>>> I choosed the third way. What do you think about future expansion
>>> of the format?
> I agree with #3 way and the s_s_name format you suggested.
> I think that It's extensible and is tolerable for future changes.
> I'm going to implement the patch based on this idea if other hackers
> agree with this design.

Please find the attached draft patch which supports multi sync replication.
This patch adds a GUC parameter synchronous_replication_method, which
represent the method of synchronous replication.

[Design of replication method]
synchronous_replication_method has two values; 'priority' and
'1-priority' for now.
We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.

* s_r_method = '1-priority'
This method is for backward compatibility, so the syntax of s_s_names
is same as today.
The behavior is same as well.

* s_r_method = 'priority'
This method is for multiple synchronous replication using priority method.
The syntax of s_s_names is,
<number of sync standbys>, <standby name> [, ...]

For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
node3' means that the master waits for acknowledge from at least 2
lowest priority servers.
If 4 standbys(node1 - node4) are available, the master server waits
acknowledge from 'node1' and 'node2.
The each status of wal senders are;

=# select application_name, sync_state from pg_stat_replication order
by application_name;
application_name | sync_state
node1 | sync
node2 | sync
node3 | potential
node4 | async
(4 rows)

After 'node2' crashed, the master will wait for acknowledge from
'node1' and 'node3'.
The each status of wal senders are;

=# select application_name, sync_state from pg_stat_replication order
by application_name;
application_name | sync_state
node1 | sync
node3 | sync
node4 | async
(3 rows)

[Changing replication method]
When we want to change the replication method, we have to change the
s_r_method at first, and then do pg_reload_conf().
After changing replication method, we can change the s_s_names.

[Expanding replication method]
If we want to expand new replication method additionally, we need to
implement two functions for each replication method:
* int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
This function obtains the list of standbys considered as synchronous
at that time, and return its length.
* bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
This function obtains LSNs(write, flush) considered as synced.

Also, this patch debug code is remain yet, you can debug this behavior
using by enable DEBUG_REPLICATION macro.

Please give me feedbacks.


Masahiko Sawada

Attachment Content-Type Size
000_multi_sync_replication_v1.patch application/octet-stream 20.6 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-09 15:33:19 Re: Include ppc64le build type for back branches
Previous Message David Fetter 2015-12-09 14:58:52 Re: [sqlsmith] Failed to generate plan on lateral subqueries