Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-03 18:40:31
Message-ID: CAD21AoAr=-ZECe-95tO+KrK8iNSdpn0r2qYKJtoYiZFXFembBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Hi,
>
> Thank you so much for reviewing this patch!
>
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
>
>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>> such appliction names are not allowed anyway. If you assume so,
>> I'd like to see a comment for that.).
>
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
> application_name
> ------------------
> abc^Id
> (1 row)
>
>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> char ychar) requires differnt character types. Is there any reason
>> for that?
>
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
>> I personally don't like addlit*string() things for such simple
>> syntax but itself is acceptble enough for me. However it uses
>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> of memory chunk for every double-quoted name. The chunks are
>> finally stacked up left uncollected until the current
>> memorycontext is deleted or reset (It is deleted just after
>> finishing config file processing). Addition to that, setting
>> s_s_names runs the parser twice. It seems to me too greedy and
>> seems that static char [NAMEDATALEN] is enough using the v12 way
>> without palloc/repalloc.
>
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?
>
>> I found that the name SyncGroupName.wait_num is not
>> instinctive. How about sync_num, sync_member_num or
>> sync_standby_num? If the last is preferable, .members also should
>> be .standbys .
>
> Thanks, sync_num is preferable to me.
>
> ===
>> I am quite uncomfortable with the existence of
>> WanSnd.sync_standby_priority. It represented the pirority in the
>> old linear s_s_names format but nested groups or even
>> single-level quarum list obviously doesn't fit it. Can we get rid
>> of sync_standby_priority, even though we realize atmost
>> n-priority for now?
>
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.
>
> ===
>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>> have specific code for every prioritizing method (which are
>> priority, quorum, nested and so). Is there any reson to use it as
>> a callback of SyncGroupNode?
>
> The reason why the current code is so is that current code is for only
> priority method supporting.
> At first version of this feature, I'd like to implement it more simple.
>
> Aside from this, of course I'm planning to have specific code for nested design.
> - The group can have some name nodes or group nodes.
> - The group can use either 2 types of method: priority or quorum.
> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> at that moment using group's method.
> - SyncRepGetStandbysFn() function returns standbys of its group,
> which are considered as sync using group's method.
>
> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> memory structure will be,
>
> "main(quorum)" --- "a"
> |
> -- "b"
> |
> -- "group1(priority)" --- "c"
> |
> -- "d"
>
> When determine synced LSNs, we need to consider group1's LSN using by
> priority method at first, and then we can determine main's LSN using
> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>
> bool
> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
> {
> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>
> if (sync_num < group->sync_num)
> return false;
>
> for (each member of sync_list)
> {
> if (member->type == group node)
> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
> f into lsn_list.
> else
> Store name node LSNs into lsn_list.
> }
>
> Determine synced LSNs of this group using lsn_list and priority method.
> Store synced LSNs into write_lsn and flush_lsn.
> return true;
> }
>
>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>> other related functions are defined in syncgroup_gram.y. It would
>> be better to place them together.
>
> SyncRepClearStandbyGroupList() is used by
> check_synchronous_standby_names(), so I put this function syncrep.c.
>
>> SyncRepStandbys are to be in multilevel and the struct is
>> naturally allowed to be so but SyncRepClearStandbyGroupList
>> assumes it in single level.
>
> Because I think that we don't need to implement to fully support
> nested style at first version.
> We have to carefully design this feature while considering
> expandability, but overkill implementation could be cause of crash.
> Consider remaining time for 9.6, I feel we could implement quorum
> method at best.
>
>> This is a comment from the aspect of abstractness of objects.
>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
>> the inside of SyncGroupNode but what the function should just
>> return seems to be the list of wansnds element. Element number is
>> useless when the SyncGroupNode nests.
>> > int
>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
>> This might need to expose 'volatile WalSnd*' (only pointer type)
>> outside of walsender.
>> Or it should return the list of index number of
>> *WalSndCtl->walsnds*.
>
> SyncRepGetSyncStandbysUsingPriority() already returns the list of
> index number of "WalSndCtl->walsnd" as sync_list, no?
> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
> inside of SyncGroupNode in my design.
> Selecting sync nodes from its group doesn't depend on the type of node.
> What SyncRepGetSyncStandbyFn() should do is to select sync node from
> *its* group.
>

Previous patch has bug around GUC parameter handling.
Attached updated version.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_multi_sync_replication_v15.patch application/octet-stream 44.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-03 19:05:09 Re: improving GROUP BY estimation
Previous Message Robert Haas 2016-03-03 18:20:05 Re: POC: Cache data in GetSnapshotData()