Re: Support for N synchronous standby servers - take 2

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-03-31 04:11:40
Message-ID: CAEepm=0Gufk1p1Mk6-2y361hpQ6cpqq3+sKckfam=RbGhY-HvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> I personally don't think it needs such a survive measure. It is
>>> very small syntax and the parser reads very short text. If the
>>> parser failes in such mode, something more serious should have
>>> occurred.
>>>
>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> > Hello,
>>>> >
>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>
>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>
>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>> fatal erros are recoverable. guc-file.l is doing so.
>>>
>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>> better to elog(FATAL) and terminate the problematic
>>>> process. This might lead to the server crash (e.g., if
>>>> postmaster emits a FATAL error, it and its all child processes
>>>> will exit soon). But probably we can live with this because the
>>>> fatal error basically rarely happens.
>>>
>>> I agree to this
>>>
>>>> OTOH, if we make the process keep running even after it gets an internal
>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>> error (e.g., because of OOM), abandon new setting value of
>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>> OTOH, the other walsender processes successfully parse the setting and
>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>> which each walsender is based on happens. This completely will mess up the
>>>> synchronous replication.
>>>
>>> On the other hand, guc-file.l seems ignoring parser errors under
>>> normal operation, even though it may cause similar inconsistency,
>>> if any..
>>>
>>> | LOG: received SIGHUP, reloading configuration files
>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>>
>>>> Therefore, I think that it's better to make the problematic process exit
>>>> with FATAL error rather than ignore the error and keep it running.
>>>
>>> +1. Restarting walsender would be far less harmful than keeping
>>> it running in doubtful state.
>>>
>>> Sould I wait for the next version or have a look on the latest?
>>>
>>
>> Attached latest patch incorporate some review comments so far, and is
>> rebased against current HEAD.
>>
>
> Sorry I attached wrong patch.
> Attached patch is correct patch.
>
> [mulit_sync_replication_v21.patch]

Here are some TPS numbers from some quick tests I ran on a set of
Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
as primary + 3 standbys, to try out different combinations of
synchronous_commit levels and synchronous_standby_names numbers. They
were run for a short time only and these are of course systems with
limited and perhaps uneven IO and CPU, but they still give some idea
of the trends. And reassuringly, the trends are travelling in the
expected directions.

All default settings except shared_buffers = 1GB, and the GUCs
required for replication.

pgbench postgres -j2 -c2 -N bench2 -T 600

1(*) 2(*) 3(*)
==== ==== ====
off = 4056 4096 4092
local = 1323 1299 1312
remote_write = 1130 1046 958
on = 860 744 701
remote_apply = 785 725 604

pgbench postgres -j16 -c16 -N bench2 -T 600

1(*) 2(*) 3(*)
==== ==== ====
off = 3952 3943 3933
local = 2964 2984 3026
remote_write = 2790 2724 2675
on = 2731 2627 2523
remote_apply = 2627 2501 2432

One thing I noticed is that there are LOG messages telling me when a
standby becomes a synchronous standby, but nothing to tell me if a
standby stops being a standby (ie because a higher priority one has
taken its place in the quorum). Would that be interesting?

Also, I spotted some tiny mistakes:

+ <indexterm zone="high-availability">
+ <primary>Dedicated language for multiple synchornous replication</primary>
+ </indexterm>

s/synchornous/synchronous/

+ /*
+ * If we are managing the sync standby, though we weren't
+ * prior to this, then announce we are now the sync standby.
+ */

s/ the / a / (two occurrences)

+ ereport(LOG,
+ (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
+ application_name, MyWalSnd->sync_standby_priority)));

s/ the / a /

offered by a transaction commit. This level of protection is referred
- to as 2-safe replication in computer science theory.
+ to as 2-safe replication in computer science theory, and group-1-safe
+ (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
+ more than <literal>remote_write</>.

Why "more than"? I think those two words should be changed to "at
least", or removed.

+ <para>
+ This syntax allows us to define a synchronous group that will wait for at
+ least N standbys of them, and a comma-separated list of group
members that are surrounded by
+ parantheses. The special value <literal>*</> for server name
matches any standby.
+ By surrounding list of group members using parantheses,
synchronous standbys are chosen from
+ that group using priority method.
+ </para>

s/parantheses/parentheses/ (two occurrences)

+ <sect2 id="dedicated-language-for-multi-sync-replication-priority">
+ <title>Prioirty Method</title>

s/Prioirty Method/Priority Method/

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2016-03-31 04:11:46 Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data
Previous Message Alvaro Herrera 2016-03-31 02:44:00 Re: Very small patch for decode.c