Re: Support for N synchronous standby servers - take 2

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-02-05 08:36:38
Message-ID: CAB7nPqSwO3qC_-JVRDbK++=D_vHPFJV0nOStjMxQ+4XbXgAzTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> Yes, please let's use the custom language, and let's not care of not
>>>> more than 1 level of nesting so as it is possible to represent
>>>> pg_stat_replication in a simple way for the user.
>>>
>>> "not" is used twice in this sentence in a way that renders me not able
>>> to be sure that I'm not understanding it not properly.
>>
>> 4 times here. Score beaten.
>>
>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> to only support configurations up to one level of nested objects, like
>> that:
>> 2[node1, node2, node3]
>> node1, 2[node2, node3], node3
>> In short, we could restrict things so as we cannot define a group of
>> nodes within an existing group.
>
> No, actually, that's stupid. Having up to two nested levels makes more
> sense, a quite common case for this feature being something like that:
> 2{node1,[node2,node3]}
> In short, sync confirmation is waited from node1 and (node2 or node3).
>
> Flattening groups of nodes with a new catalog will be necessary to
> ease the view of this data to users:
> - group name?
> - array of members with nodes/groups
> - group type: quorum or priority
> - number of items to wait for in this group

So, here are some thoughts to make that more user-friendly. I think
that the critical issue here is to properly flatten the meta data in
the custom language and represent it properly in a new catalog,
without messing up too much with the existing pg_stat_replication that
people are now used to for 5 releases since 9.0. So, I would think
that we will need to have a new catalog, say
pg_stat_replication_groups with the following things:
- One line of this catalog represents the status of a group or of a single node.
- The status of a node/group is either sync or potential, if a
node/group is specified more than once, it may be possible that it
would be sync and potential depending on where it is defined, in which
case setting its status to 'sync' has the most sense. If it is in sync
state I guess.
- Move sync_priority and sync_state, actually an equivalent from
pg_stat_replication into this new catalog, because those represent the
status of a node or group of nodes.
- group name, and by that I think that we had perhaps better make
mandatory the need to append a name with a quorum or priority group.
The group at the highest level is forcibly named as 'top', 'main', or
whatever if not directly specified by the user. If the entry is
directly a node, use the application_name.
- Type of group, quorum or priority
- Elements in this group, an element can be a group name or a node
name, aka application_name. If group is of type priority, the elements
are listed in increasing order. So the elements with lower priority
get first, etc. We could have one column listing explicitly a list of
integers that map with the elements of a group but it does not seem
worth it, what users would like to know is what are the nodes that are
prioritized. This covers the former 'priority' field of
pg_stat_replication.

We may have a good idea of how to define a custom language, still we
are going to need to design a clean interface at catalog level more or
less close to what is written here. If we can get a clean interface,
the custom language implemented, and TAP tests that take advantage of
this user interface to check the node/group statuses, I guess that we
would be in good shape for this patch.

Anyway that's not a small project, and perhaps I am over-complicating
the whole thing.

Thoughts?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-02-05 08:50:00 Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Previous Message Peter Moser 2016-02-05 08:32:55 Re: Development with Eclipse - Wrong error messages in IDE