Quick Links

Re: Support for N synchronous standby servers - take 2

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	thomas(dot)munro(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-04 08:22:28
Message-ID:	20160304.172228.29892605.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

Sorry for long, hard-to-read writings in advance..

At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q(at)mail(dot)gmail(dot)com>
> Hi,
>
> Thank you so much for reviewing this patch!
>
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
>
> > This accepts 'abc^Id' as a name, which is wrong behavior (but
> > such appliction names are not allowed anyway. If you assume so,
> > I'd like to see a comment for that.).
>
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
> application_name
> ------------------
> abc^Id
> (1 row)

Sorry, I implicitly used "^" in the meaning of "ctrl key". So
"^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
following in psql shows that.

=# set application_name to E'abc\td';
=# show application_name ;
application_name
------------------
ab?d
(1 row)

The <tab> is replaced with '?' (literally) at the time of
guc assinment.

> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> > char ychar) requires differnt character types. Is there any reason
> > for that?
>
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.

Umm. My qustion might have been a bit out of the point.

The addlitchar_xd_string(str,unsigned char c) does
appendStringInfoChar(, c). On the other hand, the signature of
the function of stringinfo is the following.

AppendStringInfoChar(StringInfo str, char ch);

Of course "char" is equivalent of "signed char" as
default. addlitchar_xd_string assigns the given character in
"unsigned char" to the parameter of AppendStringInfoChar of
"signed char".

These two are incompatible types. Imagine the
following codelet,

#include <stdio.h>

void hoge(signed char c){
int ch = c;
fprintf(stderr, "char = %d\n", ch);
}

int main(void)
{
unsigned char u;

u = 200;
hoge(u);
return 0;
}

The result is -56. So we generally should get rid of such type of
mixture of signedness for no particular reason.

In this case, the domain of the variable is 0x20-0x7e so no
problem won't be actualized but also there's no reason for the
signedness mixture.

> > I personally don't like addlit*string() things for such simple
> > syntax but itself is acceptble enough for me. However it uses
> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
> > of memory chunk for every double-quoted name. The chunks are
> > finally stacked up left uncollected until the current
> > memorycontext is deleted or reset (It is deleted just after
> > finishing config file processing). Addition to that, setting
> > s_s_names runs the parser twice. It seems to me too greedy and
> > seems that static char [NAMEDATALEN] is enough using the v12 way
> > without palloc/repalloc.
>
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?

Such long names doesn't seem to necessary. Too long identifiers
no longer act as identifier for human eyeballs. We are limiting
the length of identifiers of the whole database system to
NAMEDATALEN-1, which seems to have been enough so I don't see any
reason to have a group name longer than that.

> > I found that the name SyncGroupName.wait_num is not
> > instinctive. How about sync_num, sync_member_num or
> > sync_standby_num? If the last is preferable, .members also should
> > be .standbys .
>
> Thanks, sync_num is preferable to me.
>
> ===
> > I am quite uncomfortable with the existence of
> > WanSnd.sync_standby_priority. It represented the pirority in the
> > old linear s_s_names format but nested groups or even
> > single-level quarum list obviously doesn't fit it. Can we get rid
> > of sync_standby_priority, even though we realize atmost
> > n-priority for now?
>
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.

As far as I can see the varialbe is referred to as a boolean to
indicate whether a walsernder is connected to a candidate
synchronous standby. So the value is totally useless, at least
for now. However, SyncRepRelaseWaiters uses the value to check if
the synced LSNs can be advaned by a walsender so the variable is
useful as a boolean.

In the previous versions, the reason why WanSnd had the priority
value is that a pair of synchronized LSNs is determined only by
one wansender, which has the highest priority among active
wansenders. So even if a walsender receives a response from
walreceiver, it doesn't need to do nothing if it is not at the
highest priority. It's a simple world.

In the quorum commit word, in contrast, what
SyncRepGetSyncStandbysFn shoud do is returning certain private
information to be used to calculate a pair of safe/synched LSNs
in SyncRepGetSYncedLsnsFn looking into WalSndCtl->wansnds
list. The latter passes a pair of safe/synced LSNs to the upper
level list or SyncRepSyncedLsnAdvancedTo as the topmost
caller. There's no room for sync_standby_priority to work as the
original objective.

Even if we assign the value in the explained way, the values are
always 1 for quorum method and duplicate values for multiple
priority method. What do you want to show by the value to users?

> ===
> > The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
> > have specific code for every prioritizing method (which are
> > priority, quorum, nested and so). Is there any reson to use it as
> > a callback of SyncGroupNode?
>
> The reason why the current code is so is that current code is for only
> priority method supporting.
> At first version of this feature, I'd like to implement it more simple.
>
> Aside from this, of course I'm planning to have specific code for nested design.
> - The group can have some name nodes or group nodes.
> - The group can use either 2 types of method: priority or quorum.
> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> at that moment using group's method.
> - SyncRepGetStandbysFn() function returns standbys of its group,
> which are considered as sync using group's method.
>
> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> memory structure will be,
>
> "main(quorum)" --- "a"
> |
> -- "b"
> |
> -- "group1(priority)" --- "c"
> |
> -- "d"
>
> When determine synced LSNs, we need to consider group1's LSN using by
> priority method at first, and then we can determine main's LSN using
> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> So SyncRepGetSyncedLsnsUsingPriority() function would be,

Thank you for the explanation. I *recalled* that.

> > SyncRepClearStandbyGroupList is defined in syncrep.c but the
> > other related functions are defined in syncgroup_gram.y. It would
> > be better to place them together.
>
> SyncRepClearStandbyGroupList() is used by
> check_synchronous_standby_names(), so I put this function syncrep.c.

Thanks.

> > SyncRepStandbys are to be in multilevel and the struct is
> > naturally allowed to be so but SyncRepClearStandbyGroupList
> > assumes it in single level.
>
> Because I think that we don't need to implement to fully support
> nested style at first version.
> We have to carefully design this feature while considering
> expandability, but overkill implementation could be cause of crash.
> Consider remaining time for 9.6, I feel we could implement quorum
> method at best.

Yes, so I proposed to ass Aseert() in the function.

> > This is a comment from the aspect of abstractness of objects.
> > The callers of SyncRepGetSyncStandbysUsingPriority() need to care
> > the inside of SyncGroupNode but what the function should just
> > return seems to be the list of wansnds element. Element number is
> > useless when the SyncGroupNode nests.
> > > int
> > > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
> > This might need to expose 'volatile WalSnd*' (only pointer type)
> > outside of walsender.
> > Or it should return the list of index number of
> > *WalSndCtl->walsnds*.
>
> SyncRepGetSyncStandbysUsingPriority() already returns the list of
> index number of "WalSndCtl->walsnd" as sync_list, no?

Yes, myself don't understand what I tried to say by this:( Maybe
I mistook what sync_list returns as an index list of
SyncGroupNode. Anyway sorry for the noise.

> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
> inside of SyncGroupNode in my design.
> Selecting sync nodes from its group doesn't depend on the type of node.
> What SyncRepGetSyncStandbyFn() should do is to select sync node from
> *its* group.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Support for N synchronous standby servers - take 2 at 2016-03-03 14:30:49 from Masahiko Sawada

Responses

Re: Support for N synchronous standby servers - take 2 at 2016-03-07 07:55:30 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aleksander Alekseev	2016-03-04 09:33:55	Re: pgbench small bug fix
Previous Message	Pavel Stehule	2016-03-04 07:52:52	Re: raw output from copy