Re: Support for N synchronous standby servers - take 2

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-02-29 10:24:14
Message-ID: 20160229.192414.166736236.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I misread the previous patch. It actually worked.

At Sun, 28 Feb 2016 04:04:37 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB69-tNLVzKRZ0Opzsr6LcLY36GJ2tHGohW33Btq3yRsw(at)mail(dot)gmail(dot)com>
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.

Thank you for the new patch. The parser almost looks to work as
expected, but the following warnings were seen on build.

> In file included from syncgroup_gram.y:138:0:
> syncgroup_scanner.l:23:12: warning: ‘xd_size’ defined but not used [-Wunused-variable]
> static int xd_size; /* actual size of xd_string */
> ^
> syncgroup_scanner.l:24:12: warning: ‘xd_len’ defined but not used [-Wunused-variable]
> static int xd_len; /* string length of xd_string */

Some random comments follow.

Commnents for the lexer part.

===
> +node_name [^\ \,\[\]]

This accepts 'abc^Id' as a name, which is wrong behavior (but
such appliction names are not allowed anyway. If you assume so,
I'd like to see a comment for that.). And the excessive escaping
make it hard to read a bit. The pattern can be written as the
following more precisely. (but I don't know whether it is
generally easy to read..)

| node_name [\x20-\x7f]{-}[ \[\],]

===
The pattern name {node_name} gives me a bit
uneasiness. node_name_cont or name_chars would be preferable.

===
> [1-9][0-9]* {

I see no necessity to inhibit 0-prefixed integers as NUM. Would
you mind allowing [0-9]+ there?

===
addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
char ychar) requires differnt character types. Is there any reason
for that?

===
I personally don't like addlit*string() things for such simple
syntax but itself is acceptble enough for me. However it uses
StringInfo to hold double-quoted names, which pallocs 1024 bytes
of memory chunk for every double-quoted name. The chunks are
finally stacked up left uncollected until the current
memorycontext is deleted or reset (It is deleted just after
finishing config file processing). Addition to that, setting
s_s_names runs the parser twice. It seems to me too greedy and
seems that static char [NAMEDATALEN] is enough using the v12 way
without palloc/repalloc.

Comments for parser part.

===
The rule "result" in syncgruop_gram.y sets malloced chunk to
SyncRepStandbys ignoring exiting content so repetitive setting to
the gud s_s_names causes a memory leak. Using
SyncRepClearStandbyGroupList would be enough.

===
The meaning of SyncGroupNode.type seems oscure. The member seems
to be referred to decide how to treat the node, but the following
code will break the assumption.

> group_node->type = SYNC_REP_GROUP_GROUP | SYNC_REP_GROUP_MAIN;

It seems me that *_MAIN is an equivalent of *_GROUP &&
sync_method = *_PRIORITY. If so, *_MAIN is useless. The reader of
SyncGroupNode needs not to see wheter it was in traditional
s_s_names or in new format.

===
Bare names in s_s_names are down-cased and double-quoted ones are
not. The parser of this patch doesn't for both.

===
xd_stringdup() doesn't make a copy of the string against its
name. It's error-prone.

===
I found that the name SyncGroupName.wait_num is not
instinctive. How about sync_num, sync_member_num or
sync_standby_num? If the last is preferable, .members also should
be .standbys .

Comment for the quorum commit body part.
===
I am quite uncomfortable with the existence of
WanSnd.sync_standby_priority. It represented the pirority in the
old linear s_s_names format but nested groups or even
single-level quarum list obviously doesn't fit it. Can we get rid
of sync_standby_priority, even though we realize atmost
n-priority for now?

===
The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
have specific code for every prioritizing method (which are
priority, quorum, nested and so). Is there any reson to use it as
a callback of SyncGroupNode?

Others - random commnets
===
SyncRepClearStandbyGroupList is defined in syncrep.c but the
other related functions are defined in syncgroup_gram.y. It would
be better to place them together.

===
SyncRepStandbys are to be in multilevel and the struct is
naturally allowed to be so but SyncRepClearStandbyGroupList
assumes it in single level. Make the function to free multilevel
or explicitly inhibit multilevel using asserttion.

===
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby(s).")));

The message doesn't contain specific number of standbys so just
using plural seems to be enough for me. And besides, the message
should describe the situation more precisely. Word correction is
left to anyone else:)

+ errdetail("The transaction has already committed locally, but might not have been replicated to some of the required standbys.")));

===
+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority.

"active" means not only defined priority but also have informed
WAL flush position.

+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority and valid flush position reported.

===
If there's no reason for SyncRepStandbyIsSync not to take WalSnd
directly, taking walsnd is simpler.

static bool SyncRepStandbyIsSync(volatile WalSnd *walsnd);

===
> * Update the LSNs on each queue based upon our latest state. This
> * implements a simple policy of first-valid-standby-releases-waiter.
> *
> * Other policies are possible, which would change what we do here and what
> * perhaps also which information we store as well.
> */
> void
> SyncRepReleaseWaiters(void)

This comment looks wrong for the new code.

===
> * Select low priority standbys from walsnds array. If there are same
> * priority standbys, first defined standby is selected. It's possible
> * to have same priority different standbys, so we can not break loop
> * even when standby having target_prioirty priority is found.

"low priority" here seems to be a mistake of "high priority
standbys" or "standbys with low priority value".

> * Returns the list of standbys in sync up to the number that
> * required to satisfy synchronous_standby_names. If there
> * are standbys with the same priority values, the first
> * defined ones are selected. It's possible for multiple
> * standbys to have a same priority value when multiple
> * walreceiver gives the same name, so we do not break the
> * inner loop just by finding a standby with the
> * target_priority.

===
> /* Got enough synchronous stnadby */

"staneby" => "standbys"

===
This is a comment from the aspect of abstractness of objects.
The callers of SyncRepGetSyncStandbysUsingPriority() need to care
the inside of SyncGroupNode but what the function should just
return seems to be the list of wansnds element. Element number is
useless when the SyncGroupNode nests.

> int
> SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)

This might need to expose 'volatile WalSnd*' (only pointer type)
outside of walsender.

Or it should return the list of index number of
*WalSndCtl->walsnds*.

===
The dependency definition seems to be wrong in Makefile so

editing related files won't cause appropriate
compilation. syncgroup_gram.h and syncgroup_gram.c are generated
at once from the .y file. and syncgroup_gram.o is generated from
syncgroup_gram.c and syncgroup_scanner.c.

-syncgroup_gram.o: syncgroup_scanner.c
-
-syncgroup_gram.h: syncgroup_gram.c ;
+syncgroup_gram.o: syncgroup_scanner.c syncgroup_gram.c

===
In pg_stat_get_wal_senders, the num_sync looks to have a chance
to be used uninitialized but I don't know why the compiler don't
complain about it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-02-29 10:42:03 Re: WIP: Access method extendability
Previous Message Rajkumar Raghuwanshi 2016-02-29 10:19:49 Issue with NULLS LAST, with postgres_fdw sort pushdown