Re: Support for N synchronous standby servers - take 2

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-07-29 12:28:37
Message-ID: CAB7nPqSU=07zYBU-mEyiQOHpcTmBY=0VbJd9GjGCWzEVB9E0KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 29, 2015 at 9:03 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Jul 21, 2015 at 3:50 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Mon, Jul 20, 2015 at 9:59 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>>> Simon Riggs wrote:
>>>
>>>> The choice between formats is not
>>>> solely predicated on whether we have multi-line support.
>>>
>>>> I still think writing down some actual use cases would help bring the
>>>> discussion to a conclusion. Inventing a general facility is hard without
>>>> some clear goals about what we need to support.
>>>
>>> We need to at least support the following:
>>> - Grouping: Specify of standbys along with the minimum number of commits
>>> required from the group.
>>> - Group Type: Groups can either be priority or quorum group.
>>
>> As far as I understood at the lowest level a group is just an alias
>> for a list of nodes, quorum or priority are properties that can be
>> applied to a group of nodes when this group is used in the expression
>> to define what means synchronous commit.
>>
>>> - Group names: to simplify status reporting
>>> - Nesting: At least 2 levels of nesting
>>
>> If I am following correctly, at the first level there is the
>> definition of the top level objects, like groups and sync expression.
>>
>
> The grouping and using same application_name different server is similar.
> How does the same application_name different server work?

In the same of a priority group both nodes get the same priority,
imagine for example that we need to wait for 2 nodes with lower
priority: node1 with priority 1, node2 with priority 2 and again node2
with priority 2, we would wait for the first one, and then one of the
second. In quorum group, any of them could be qualified for selection.

>>> Using JSON, sync rep parameter to replicate in 2 different clusters could be
>>> written as:
>>>
>>> {"remotes":
>>> {"quorum": 2,
>>> "servers": [{"london":
>>> {"priority": 2,
>>> "servers": ["lndn1", "lndn2", "lndn3"]
>>> }}
>>> ,
>>> {"nyc":
>>> {"priority": 1,
>>> "servers": ["ny1", "ny2"]
>>> }}
>>> ]
>>> }
>>> }
>>> The same parameter in the new language (as suggested above) could be written
>>> as:
>>> 'remotes: 2(london: 1[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'
>>
>> OK, there is a typo. That's actually 2(london: 2[lndn1, lndn2, lndn3],
>> nyc: 1[ny1, ny2]) in your grammar. Honestly, if we want group aliases,
>> I think that JSON makes the most sense. One of the advantage of a
>> group is that you can use it in several places in the blob and set
>> different properties into it, hence we should be able to define a
>> group out of the sync expression.
>> Hence I would think that something like that makes more sense:
>> {
>> "sync_standby_names":
>> {
>> "quorum":2,
>> "nodes":
>> [
>> {"priority":1,"group":"cluster1"},
>> {"quorum":2,"nodes":["node1","node2","node3"]}
>> ]
>> },
>> "groups":
>> {
>> "cluster1":["node11","node12","node13"],
>> "cluster2":["node21","node22","node23"]
>> }
>> }
>>
>>> Also, I was thinking the name of the main group could be optional.
>>> Internally, it can be given the name 'default group' or 'main group' for
>>> status reporting.
>>>
>>> The above could also be written as:
>>> '2(london: 2[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'
>>>
>>> backward compatible:
>>> In JSON, while validating we may have to check if it starts with '{' to go
>>
>> Something worth noticing, application_name can begin with "{".
>>
>>> for JSON parsing else proceed with the current method.
>>
>>> A,B,C => 1[A,B,C]. This can be added in the new parser code.
>>
>> This makes sense. We could do the same for JSON-based format as well
>> by reusing the in-memory structure used to deparse the blob when the
>> former grammar is used as well.
>
> If I validate s_s_name JSON syntax, I will definitely use JSONB,
> rather than JSON.
> Because JSONB has some useful operation functions for adding node,
> deleting node to s_s_name today.
> But the down side of using JSONB for s_s_name is that it could switch
> in key name order place.(and remove duplicate key)
> For example in the syntax Michael suggested,
> [...]
> "group" and "sync_standby_names" has been switched place. I'm not sure
> it's good for the users.

I think that's perfectly fine.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-07-29 12:40:22 Re: Freeze avoidance of very large table.
Previous Message Pavel Golub 2015-07-29 12:25:20 Re: Remaining 'needs review' patchs in July commitfest