Re: Support for N synchronous standby servers - take 2

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: masao(dot)fujii(at)gmail(dot)com
Cc: michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-02-09 04:16:21
Message-ID: 20160209.131621.54420844.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHnTKmd90Vu19Swu0C+2mnWxvAH=1FE=-xUbo3s94pRRg(at)mail(dot)gmail(dot)com>
> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> >> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
> >>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>>>> Yes, please let's use the custom language, and let's not care of not
> >>>>> more than 1 level of nesting so as it is possible to represent
> >>>>> pg_stat_replication in a simple way for the user.
> >>>>
> >>>> "not" is used twice in this sentence in a way that renders me not able
> >>>> to be sure that I'm not understanding it not properly.
> >>>
> >>> 4 times here. Score beaten.
> >>>
> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> >>> to only support configurations up to one level of nested objects, like
> >>> that:
> >>> 2[node1, node2, node3]
> >>> node1, 2[node2, node3], node3
> >>> In short, we could restrict things so as we cannot define a group of
> >>> nodes within an existing group.
> >>
> >> No, actually, that's stupid. Having up to two nested levels makes more
> >> sense, a quite common case for this feature being something like that:
> >> 2{node1,[node2,node3]}
> >> In short, sync confirmation is waited from node1 and (node2 or node3).
> >>
> >> Flattening groups of nodes with a new catalog will be necessary to
> >> ease the view of this data to users:
> >> - group name?
> >> - array of members with nodes/groups
> >> - group type: quorum or priority
> >> - number of items to wait for in this group
> >
> > So, here are some thoughts to make that more user-friendly. I think
> > that the critical issue here is to properly flatten the meta data in
> > the custom language and represent it properly in a new catalog,
> > without messing up too much with the existing pg_stat_replication that
> > people are now used to for 5 releases since 9.0. So, I would think
> > that we will need to have a new catalog, say
> > pg_stat_replication_groups with the following things:
> > - One line of this catalog represents the status of a group or of a single node.
> > - The status of a node/group is either sync or potential, if a
> > node/group is specified more than once, it may be possible that it
> > would be sync and potential depending on where it is defined, in which
> > case setting its status to 'sync' has the most sense. If it is in sync
> > state I guess.
> > - Move sync_priority and sync_state, actually an equivalent from
> > pg_stat_replication into this new catalog, because those represent the
> > status of a node or group of nodes.
> > - group name, and by that I think that we had perhaps better make
> > mandatory the need to append a name with a quorum or priority group.
> > The group at the highest level is forcibly named as 'top', 'main', or
> > whatever if not directly specified by the user. If the entry is
> > directly a node, use the application_name.
> > - Type of group, quorum or priority
> > - Elements in this group, an element can be a group name or a node
> > name, aka application_name. If group is of type priority, the elements
> > are listed in increasing order. So the elements with lower priority
> > get first, etc. We could have one column listing explicitly a list of
> > integers that map with the elements of a group but it does not seem
> > worth it, what users would like to know is what are the nodes that are
> > prioritized. This covers the former 'priority' field of
> > pg_stat_replication.
> >
> > We may have a good idea of how to define a custom language, still we
> > are going to need to design a clean interface at catalog level more or
> > less close to what is written here. If we can get a clean interface,
> > the custom language implemented, and TAP tests that take advantage of
> > this user interface to check the node/group statuses, I guess that we
> > would be in good shape for this patch.
> >
> > Anyway that's not a small project, and perhaps I am over-complicating
> > the whole thing.
> >
> > Thoughts?
>
> I agree that we would need something like such new view in the future,
> however it seems too late to work on that for 9.6 unfortunately.
> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> 1-level priority list, now, then we can extend it to cover other cases.
>
> If we can commit the simple version too early and there is enough
> time before the date of feature freeze, of course I'm happy to review
> the extended version like you proposed, for 9.6.

I agree to Fujii-san. There would be many of convenient gadgets
around this and they are completely welcome, but having
fundamental functionality in 9.6 seems to be far benetifical for
most of us.

At least the extensible syntax is fixed, internal structures can
be gradually exnteded along with syntactical enhancement. Over
three levels of definition or group name are syntactically
reserved and they are allowed to be nothing for now. JSON could
be added but it is too complicated for simple cases.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-02-09 04:18:40 Re: Support for N synchronous standby servers - take 2
Previous Message Michael Paquier 2016-02-09 04:15:15 Re: Support for N synchronous standby servers - take 2