Re: Support for N synchronous standby servers - take 2

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-11-13 03:52:12
Message-ID: 20151113.125212.102628436.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg(at)mail(dot)gmail(dot)com>
> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
...
> >> This patch is a tool for discussion, so I'm not going to fix this bug
> >> until getting consensus.
> >>
> >> We are still under the discussion to find solution that can get consensus.
> >> I felt that it's difficult to select from the two approaches within
> >> this development cycle, and there would not be time to implement such
> >> big feature even if we selected.
> >> But this feature is obviously needed by many users.
> >> So I'm considering more simple and extensible something solution, the
> >> idea I posted is one of them.
> >> The another worth considering approach is that just specifying the
> >> number of sync standby. It also can cover the main use cases in
> >> some-cases.
> >
> > Yes, it covers main and simple use case like "I want to have multiple
> > synchronous replicas!". Even if we miss quorum commit at the first
> > version, the feature is still very useful.

+1

> It can cover not only the case you mentioned but also main use case
> Michael mentioned by setting same application_name.
> And that first version patch is almost implemented, so just needs to
> be reviewed.
>
> I think that it would be good to implement the simple feature at the
> first version, and then coordinate the design based on opinion and
> feed backs from more user, use-case.

Yeah. I agree with it. And I have two proposals in this
direction.

- Notation

synchronous_standby_names, and synchronous_replication_method as
a variable to provide other syntax is probably no argument
except its name. But I feel synchronous_standby_num looks bit
too specific.

I'd like to propose if this doesn't reprise the argument on
notation for replication definitions:p

The following two GUCs would be enough to bear future expansion
of notation syntax and/or method.

synchronous_standby_names : as it is

synchronous_replication_method:

default is "1-priority", which means the same with the current
meaning. possible additional values so far would be,

"n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
where n is the number of required acknowledges.

"n-quorum": the format of s_s_names is the same as above, but
it is read in quorum context.

These can be expanded, for example, as follows, but in future.

"complex" : Michael's format.
"json" : JSON?
"json-ext": specify JSON in external file.

Even after we have complex notations, I suppose that many use
cases are coverd by the first tree notations.

- Internal design

What should be done in SyncRepReleaseWaiters() is calculating a
pair of LSNs that can be regarded as synced and decide whether
*this* walsender have advanced the LSN pair, then trying to
release backends that wait for the LSNs *if* this walsender has
advanced them.

From such point, the proposed patch will make redundant trials
to release backens.

Addition to that, the patch looks to be a mixture of the current
implement and the new feature. These are for the same objective
so they cannot coexist each other, I think. As the result, codes
for both quorum/priority judgement appear at multiple level in
call tree. This would be an obstacle for future (possible)
expansion.

So, I think this feature should be implemented as following,

SyncRepInitConfig reads the configuration and stores the result
structure into elsewhere such like WalSnd->syncrepset_definition
instead of WalSnd->sync_standby_priority, which should be
removed. Nothing would be stored if the current wal sender is
not a member of the defined replication set. Storing a pointer
to matching function there would increase the flexibility but
such implement in contrast will make the code difficult to be
read.. (I often look for the entity of xlogreader->read_page()
;)

Then SyncRepSyncedLsnAdvancedTo() instead of
SyncRepGetSynchronousStandbys() returns an LSN pair that can be
regarded as 'synced' according to specified definition of
replication set and whether this walsender have advanced the
LSNs.

Finally, SyncRepReleaseWaiters() uses it to release backends if
needed.

The differences among quorum/priority or others are confined in
SyncRepSyncedLsnAdvancedTo(). As the result,
SyncRepReleaseWaiters would look as following.

| SyncRepReleaseWaiters(void)
| {
| if (MyWalSnd->syncrepset_definition == NULL || ...)
| return;
| ...
| if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
| {
| /* I haven't advanced the synced LSNs */
| LWLockRelease(SyncRepLock);
| rerturn;
| }
| /* Set the lsn first so that when we wake backends they will relase...

I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
does but perhaps yes we can:p in effective manner..

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-11-13 04:36:42 Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Previous Message Amit Kapila 2015-11-13 03:39:17 Re: Parallel Seq Scan