Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-24 13:29:01
Message-ID: CAD21AoCxwezOTf9kLQRhuf2y=1c_fGjCormqJfqHOmQW8EgaDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 24, 2016 at 2:26 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBVn3_5qC_CKeKSXTu963mM=n9-GxzF7KCPreTTMS+JGQ(at)mail(dot)gmail(dot)com>
>> On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
>> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >>> I don't think it's so difficult to extend this version so that
>> >>> it supports also quorum commit.
>> >>
>> >> Mmm. I think I understand this just now. As Sawada-san said
>> >> before, all standbys in a single-level quorum set having the same
>> >> sync_standby_prioirity, the current algorithm works as it is. It
>> >> also true for the case that some quorum sets are in a priority
>> >> set.
>> >>
>> >> What about some priority sets in a quorum set?
>>
>> We should surely consider it that when we support more than 1 nest
>> level configuration.
>> IMO, we can have another information which indicates current sync
>> standbys instead of sync_priority.
>> For now, we are'nt trying to support even quorum method, so we could
>> consider it after we can support both priority method and quorum
>> method without incident.
>
> Fine with me.
>
>> >>> > StringInfo for double-quoted names seems to me to be overkill,
>> >>> > since it allocates 1024 byte block for every such name. A static
>> >>> > buffer seems enough for the usage as I said.
>> >>>
>> >>> So, what about changing the scanner code as follows?
>> >>>
>> >>> <xd>{xdstop} {
>> >>> yylval.str = pstrdup(xdbuf.data);
>> >>> pfree(xdbuf.data);
>> >>> BEGIN(INITIAL);
>> >>> return NAME;
>> >>>
>> >>> > The parser is called for not only for SIGHUP, but also for
>> >>> > starting of every walsender. The latter is not necessary but it
>> >>> > is the matter of trade-off between simplisity and
>> >>> > effectiveness.
>> >>>
>> >>> Could you elaborate why you think that's not necessary?
>> >>
>> >> Sorry, starting of walsender is not so large problem, 1024 bytes
>> >> memory is just abandoned once. SIGHUP is rather a problem.
>> >>
>> >> The part is called under two kinds of memory context, "config
>> >> file processing" then "Replication command context". The former
>> >> is deleted just after reading the config file so no harm but the
>> >> latter is a quite long-lasting context and every reloading bloats
>> >> the context with abandoned memory blocks. It is needed to be
>> >> pfreed or to use a memory context with shorter lifetime, or use
>> >> static storage of 64 byte-length, even though the bloat become
>> >> visible after very many times of conf reloads.
>> >
>> > SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
>> > in the patch. Or am I missing something?
>
> Sorry, instead, the memory from strdup() will be abandoned in
> upper level. (Thinking for some time..) Ah, I found that the
> problem should be here.
>
> > SyncRepFreeConfig(SyncRepConfigData *config)
> > {
> ...
> !> list_free(config->members);
> > pfree(config);
> > }
>
> The list_free *doesn't* free the memory blocks pointed by
> lfirst(cell), which has been pstrdup'ed. It should be
> list_free_deep(config->members) instead to free it completely.
>> >>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>> >>> startup. A child process takes over the internal data struct for the parsed
>> >>> s_s_names when it's forked by the postmaster. This is what the previous
>> >>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>> >>> In that environment, the data struct should be passed to a child process via
>> >>> the special file (like write_nondefault_variables() does), or it should
>> >>> be constructed during walsender startup (like latest version of the patch
>> >>> does). IMO the latter is simpler.
>> >>
>> >> Ah, I haven't notice that but I agree with it.
>> >>
>> >>
>> >> As per my previous comment, syncrep_scanner.l doesn't reject some
>> >> (nonprintable and multibyte) characters in a name, which is to be
>> >> silently replaced with '?' for application_name. It would not be
>> >> a problem for almost all of us but might be needed to be
>> >> documented if we won't change the behavior to be the same as
>> >> application_name.
>> >
>> > There are three options:
>> >
>> > 1. Replace nonprintable and non-ASCII characters in s_s_names with ?
>> > 2. Emit an error if s_s_names contains nonprintable and non-ASCII characters
>> > 3. Do nothing (9.5 or before behave in this way)
>> >
>> > You implied that we should choose #1 or #2?
>>
>> Previous(9.5 or before) s_s_names also accepts non-ASCII character and
>> non-printable character, and can show it without replacing these
>> character to '?'.
>
> Thank you for pointint it out (it was completely out of my
> mind..). I have no objection to keep the previous behavior.
>
>> From backward compatibility perspective, we should not choose #1 or #2.
>> Different behaviour between previous and current s_s_names is that
>> previous s_s_names doesn't accept the node name having the sort of
>> white-space character that isspace() returns true with.
>> But current s_s_names allows us to specify such a node name.
>> I guess that changing such behaviour is enough for fixing this issue.
>> Thoughts?
>

Attached latest patch incorporating all review comments so far.

Aside from the review comments, I did following changes;
- Add logic to avoid fatal exit in yy_fatal_error().
- Improve regression test cases.

Also I felt a sense of discomfort regarding using [ and ] as a special
character for priority method.
Because (, ) and [, ] are a little similar each other, so it would
easily make many syntax errors when nested style is supported.
And the synopsis of that in documentation is odd;
synchronous_standby_names = 'N [ node_name [, ...] ]'

This topic has been already discussed before but, we might want to
change it to other characters such as < and >?

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
muti_sync_replication_v19.patch text/x-diff 41.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2016-03-24 13:50:45 Small patch: Change calling convention for ShmemInitHash (and fix possible bug)
Previous Message Amit Kapila 2016-03-24 13:16:31 Re: Speed up Clog Access by increasing CLOG buffers