Re: Support for N synchronous standby servers - take 2

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: masao(dot)fujii(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-29 08:36:44
Message-ID: 20160329.173644.22077026.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I personally don't think it needs such a survive measure. It is
very small syntax and the parser reads very short text. If the
parser failes in such mode, something more serious should have
occurred.

At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello,
> >
> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > As mentioned in my comment, SQL parser converts yy_fatal_error
> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
> > #define'ing fprintf). So it is doable if you mind exit().
>
> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
> flex fatal error occurs, postmaster just exits instead of jumping out of parser.

If The ERROR may be LOG or DEBUG2 either, if we think the parser
fatal erros are recoverable. guc-file.l is doing so.

> ISTM that, when an internal flex fatal error occurs, it's
> better to elog(FATAL) and terminate the problematic
> process. This might lead to the server crash (e.g., if
> postmaster emits a FATAL error, it and its all child processes
> will exit soon). But probably we can live with this because the
> fatal error basically rarely happens.

I agree to this

> OTOH, if we make the process keep running even after it gets an internal
> fatal error (like Sawada's patch or your idea do), this might cause more
> serious problem. Please imagine the case where one walsender gets the fatal
> error (e.g., because of OOM), abandon new setting value of
> synchronous_standby_names, and keep running with the previous setting value.
> OTOH, the other walsender processes successfully parse the setting and
> keep running with new setting. In this case, the inconsistency of the setting
> which each walsender is based on happens. This completely will mess up the
> synchronous replication.

On the other hand, guc-file.l seems ignoring parser errors under
normal operation, even though it may cause similar inconsistency,
if any..

| LOG: received SIGHUP, reloading configuration files
| LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
| LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied

> Therefore, I think that it's better to make the problematic process exit
> with FATAL error rather than ignore the error and keep it running.

+1. Restarting walsender would be far less harmful than keeping
it running in doubtful state.

Sould I wait for the next version or have a look on the latest?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-03-29 08:39:13 Re: Relation extension scalability
Previous Message Magnus Hagander 2016-03-29 08:22:34 Re: backup tools ought to ensure created backups are durable