Re: Support for N synchronous standby servers - take 2

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-29 07:51:02
Message-ID: CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
> sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Thank you for the new patch. Sorry to have overlooked some
>> > versions. I'm looking the v19 patch now.
>> >
>> > make complains for an unused variable.
>
> Thank you. I'll have a closer look on it a bit later.
>
>> >> Attached latest patch incorporating all review comments so far.
>> >>
>> >> Aside from the review comments, I did following changes;
>> >> - Add logic to avoid fatal exit in yy_fatal_error().
>> >
>> > Maybe good catch, but..
>> >
>> >> syncrep_scanstr(const char *str)
>> > ..
>> >> * Regain control after a fatal, internal flex error. It may have
>> >> * corrupted parser state. Consequently, abandon the file, but trust
>> > ~~~~~~~~~~~~~~~~
>> >> * that the state remains sane enough for syncrep_yy_delete_buffer().
>> > ~~~~~~~~~~~~~~~~~~~~~~~~
>> >
>> > guc-file.l actually abandones the config file but syncrep_scanner
>> > reads only a value of an item in it. And, the latter is
>> > eventually true but a bit hard to understand.
>> >
>> > The patch will emit a mysterious error message like this.
>> >
>> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
>> >> configuration file ".../postgresql.conf" contains errors
>> >
>> > This is utterly wrong. A bit related to that, it seems to me that
>> > syncrep_scan.l doesn't need the same mechanism with
>> > guc-file.l. The nature of the modification would be making
>> > call_*_check_hook to be tri-state instead of boolean. So just
>> > cathing errors in call_*_check_hook and ereport()'ing as SQL
>> > parser does seems enough, but either will do for me.
>>
>> Well, I think that call_*_check_hook can not catch such a fatal error.
>
> As mentioned in my comment, SQL parser converts yy_fatal_error
> into ereport(ERROR), which can be caught by the upper PG_TRY (by
> #define'ing fprintf). So it is doable if you mind exit().

I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
flex fatal error occurs, postmaster just exits instead of jumping out of parser.

ISTM that, when an internal flex fatal error occurs, it's better to elog(FATAL)
and terminate the problematic process. This might lead to the server crash
(e.g., if postmaster emits a FATAL error, it and its all child processes will
exit soon). But probably we can live with this because the fatal error basically
rarely happens.

OTOH, if we make the process keep running even after it gets an internal
fatal error (like Sawada's patch or your idea do), this might cause more
serious problem. Please imagine the case where one walsender gets the fatal
error (e.g., because of OOM), abandon new setting value of
synchronous_standby_names, and keep running with the previous setting value.
OTOH, the other walsender processes successfully parse the setting and
keep running with new setting. In this case, the inconsistency of the setting
which each walsender is based on happens. This completely will mess up the
synchronous replication.

Therefore, I think that it's better to make the problematic process exit
with FATAL error rather than ignore the error and keep it running.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-03-29 07:59:01 Re: IF (NOT) EXISTS in psql-completion
Previous Message Craig Ringer 2016-03-29 07:37:18 Re: pglogical_output - a general purpose logical decoding output plugin