Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-07 07:55:30
Message-ID: CAD21AoCfSX_MbOze8C--zoDJatUfaJcF5PzZ-ELqpigNBQ_ERw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Reply to multiple hackers.
Thank you for reviewing this patch.

> + used. Priority is given to servers in the order that the appear
> in the list.
>
> s/the appear/they appear/
>
> - The minimum wait time is the roundtrip time between primary to standby.
> + The minimum wait time is the roundtrip time between the primary and the
> + almost synchronous standby.
>
> s/almost/slowest/

Will fix this typo. Thanks!

On Fri, Mar 4, 2016 at 5:22 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> Sorry for long, hard-to-read writings in advance..
>
> At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q(at)mail(dot)gmail(dot)com>
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>> > This accepts 'abc^Id' as a name, which is wrong behavior (but
>> > such appliction names are not allowed anyway. If you assume so,
>> > I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>> application_name
>> ------------------
>> abc^Id
>> (1 row)
>
> Sorry, I implicitly used "^" in the meaning of "ctrl key". So
> "^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
> following in psql shows that.
>
> =# set application_name to E'abc\td';
> =# show application_name ;
> application_name
> ------------------
> ab?d
> (1 row)
>
> The <tab> is replaced with '?' (literally) at the time of
> guc assinment.

Oh, I see.
I will comment for that.

>> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> > char ychar) requires differnt character types. Is there any reason
>> > for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
> Umm. My qustion might have been a bit out of the point.
>
> The addlitchar_xd_string(str,unsigned char c) does
> appendStringInfoChar(, c). On the other hand, the signature of
> the function of stringinfo is the following.
>
> AppendStringInfoChar(StringInfo str, char ch);
>
> Of course "char" is equivalent of "signed char" as
> default. addlitchar_xd_string assigns the given character in
> "unsigned char" to the parameter of AppendStringInfoChar of
> "signed char".
>
> These two are incompatible types. Imagine the
> following codelet,
>
> #include <stdio.h>
>
> void hoge(signed char c){
> int ch = c;
> fprintf(stderr, "char = %d\n", ch);
> }
>
> int main(void)
> {
> unsigned char u;
>
> u = 200;
> hoge(u);
> return 0;
> }
>
> The result is -56. So we generally should get rid of such type of
> mixture of signedness for no particular reason.
>
> In this case, the domain of the variable is 0x20-0x7e so no
> problem won't be actualized but also there's no reason for the
> signedness mixture.

Thank you for explanation.
I will fix this.

>> > I personally don't like addlit*string() things for such simple
>> > syntax but itself is acceptble enough for me. However it uses
>> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> > of memory chunk for every double-quoted name. The chunks are
>> > finally stacked up left uncollected until the current
>> > memorycontext is deleted or reset (It is deleted just after
>> > finishing config file processing). Addition to that, setting
>> > s_s_names runs the parser twice. It seems to me too greedy and
>> > seems that static char [NAMEDATALEN] is enough using the v12 way
>> > without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>
> Such long names doesn't seem to necessary. Too long identifiers
> no longer act as identifier for human eyeballs. We are limiting
> the length of identifiers of the whole database system to
> NAMEDATALEN-1, which seems to have been enough so I don't see any
> reason to have a group name longer than that.
>

I see. I will fix this.

>> > I found that the name SyncGroupName.wait_num is not
>> > instinctive. How about sync_num, sync_member_num or
>> > sync_standby_num? If the last is preferable, .members also should
>> > be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>> > I am quite uncomfortable with the existence of
>> > WanSnd.sync_standby_priority. It represented the pirority in the
>> > old linear s_s_names format but nested groups or even
>> > single-level quarum list obviously doesn't fit it. Can we get rid
>> > of sync_standby_priority, even though we realize atmost
>> > n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same priority, and in nested group each standbys are given
>> the priority starting from 1.
>
> As far as I can see the varialbe is referred to as a boolean to
> indicate whether a walsernder is connected to a candidate
> synchronous standby. So the value is totally useless, at least
> for now. However, SyncRepRelaseWaiters uses the value to check if
> the synced LSNs can be advaned by a walsender so the variable is
> useful as a boolean.
>
> In the previous versions, the reason why WanSnd had the priority
> value is that a pair of synchronized LSNs is determined only by
> one wansender, which has the highest priority among active
> wansenders. So even if a walsender receives a response from
> walreceiver, it doesn't need to do nothing if it is not at the
> highest priority. It's a simple world.
>
> In the quorum commit word, in contrast, what
> SyncRepGetSyncStandbysFn shoud do is returning certain private
> information to be used to calculate a pair of safe/synched LSNs
> in SyncRepGetSYncedLsnsFn looking into WalSndCtl->wansnds
> list. The latter passes a pair of safe/synced LSNs to the upper
> level list or SyncRepSyncedLsnAdvancedTo as the topmost
> caller. There's no room for sync_standby_priority to work as the
> original objective.
>
> Even if we assign the value in the explained way, the values are
> always 1 for quorum method and duplicate values for multiple
> priority method. What do you want to show by the value to users?

I agree with you.
When we implement nested style of multiple sync replication, it would
tough to show to users using by sync_standby_priority.
But in current our first goal (implementing 1-nest style), it doesn't
seem to need to get rid of sync_standby_priority from WalSnd so far,
no?
Towards multiple nested style, I'm roughly planning to have new system
view is defined like follows.

- New system view shows all groups and nodes informations.
- Move sync_state from pg_stat_replication to new system view.
- Get rid of sync_priority from pg_stat_replication.
- Add new sync_state 'quorum' that indicates candidate sync standbys
of its group using quorum method.
- If parent group state is potential, 'potential:' prefix is added to
the child standby's sync_state.

* s_s_names = '2[a, 1(b,c):group1, 1[d,e]:gorup2]'
name | sync_method | member | sync_num |
sync_state | parant_group
-----------+--------------------+---------------------------+---------------+--------------------------+--------------
main | priority | {a,group1,group2} | 2 |
|
a | | |
| sync | main
group1 | quorum | {b,c} | 1 |
sync | main
b | | |
| sync | group1
c | | |
| potential | group1
group2 | priority | {d,e} | 1
| potential | main
d | | |
| potential:sync | group2
e | | |
| potential:potential | group2
(8 rows)

* s_s_names = '2(a, 1[b,c]:group1, 1(d,e):group2)'
name | sync_method | member | sync_num |
sync_state | parant_group
-----------+--------------------+--------------------------+----------------+--------------------------+--------------
main | quorum | {a,group1,group2} | 2 |
|
a | | |
| quorum | main
group1 | priority | {b,c} | 1
| quorum | main
b | | |
| sync | group1
c | | |
| potential | group1
group2 | quorum | {d,e} | 1 |
quorum | main
d | | |
| quorum | group2
e | | |
| quorum | group2
(8 rows)

>> > SyncRepStandbys are to be in multilevel and the struct is
>> > naturally allowed to be so but SyncRepClearStandbyGroupList
>> > assumes it in single level.
>>
>> Because I think that we don't need to implement to fully support
>> nested style at first version.
>> We have to carefully design this feature while considering
>> expandability, but overkill implementation could be cause of crash.
>> Consider remaining time for 9.6, I feel we could implement quorum
>> method at best.
>
> Yes, so I proposed to ass Aseert() in the function.

Will add it.

Regards,

--
Masahiko Sawada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-03-07 07:57:29 Re: Parallel Aggregate
Previous Message Amit Langote 2016-03-07 07:16:30 Re: [PROPOSAL] VACUUM Progress Checker.