Re: Support for N synchronous standby servers - take 2

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-04-04 08:28:07
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>> I personally don't think it needs such a survive measure. It is
>>>>> very small syntax and the parser reads very short text. If the
>>>>> parser failes in such mode, something more serious should have
>>>>> occurred.
>>>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>>> > Hello,
>>>>>> >
>>>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>>>> fatal erros are recoverable. guc-file.l is doing so.
>>>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>>>> better to elog(FATAL) and terminate the problematic
>>>>>> process. This might lead to the server crash (e.g., if
>>>>>> postmaster emits a FATAL error, it and its all child processes
>>>>>> will exit soon). But probably we can live with this because the
>>>>>> fatal error basically rarely happens.
>>>>> I agree to this
>>>>>> OTOH, if we make the process keep running even after it gets an internal
>>>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>>>> error (e.g., because of OOM), abandon new setting value of
>>>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>>>> OTOH, the other walsender processes successfully parse the setting and
>>>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>>>> which each walsender is based on happens. This completely will mess up the
>>>>>> synchronous replication.
>>>>> On the other hand, guc-file.l seems ignoring parser errors under
>>>>> normal operation, even though it may cause similar inconsistency,
>>>>> if any..
>>>>> | LOG: received SIGHUP, reloading configuration files
>>>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>>>>> Therefore, I think that it's better to make the problematic process exit
>>>>>> with FATAL error rather than ignore the error and keep it running.
>>>>> +1. Restarting walsender would be far less harmful than keeping
>>>>> it running in doubtful state.
>>>>> Sould I wait for the next version or have a look on the latest?
>>>> Attached latest patch incorporate some review comments so far, and is
>>>> rebased against current HEAD.
>>> Sorry I attached wrong patch.
>>> Attached patch is correct patch.

Thanks for updating the patch!

I applied the following changes to the patch.
Attached is the revised version of the patch.

- Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on
the recent discussion with Horiguchi-san.
- Improved the documentation.
- Fixed some bugs.
- Removed the changes for recovery testing framework. I'd like to commit
those changes later separately from the main patch of multiple sync rep.

Barring any objections, I'll commit this patch.

>> One thing I noticed is that there are LOG messages telling me when a
>> standby becomes a synchronous standby, but nothing to tell me if a
>> standby stops being a standby (ie because a higher priority one has
>> taken its place in the quorum). Would that be interesting?


>> Also, I spotted some tiny mistakes:
>> + <indexterm zone="high-availability">
>> + <primary>Dedicated language for multiple synchornous replication</primary>
>> + </indexterm>
>> s/synchornous/synchronous/

Confirmed that there is no typo "synchornous" in the latest patch.

>> + /*
>> + * If we are managing the sync standby, though we weren't
>> + * prior to this, then announce we are now the sync standby.
>> + */
>> s/ the / a / (two occurrences)


>> + ereport(LOG,
>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>> + application_name, MyWalSnd->sync_standby_priority)));
>> s/ the / a /

I have no objection to this change itself. But we have used this message
in 9.5 or before, so if we apply this change, probably we need

>> offered by a transaction commit. This level of protection is referred
>> - to as 2-safe replication in computer science theory.
>> + to as 2-safe replication in computer science theory, and group-1-safe
>> + (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
>> + more than <literal>remote_write</>.
>> Why "more than"? I think those two words should be changed to "at
>> least", or removed.


>> + <para>
>> + This syntax allows us to define a synchronous group that will wait for at
>> + least N standbys of them, and a comma-separated list of group
>> members that are surrounded by
>> + parantheses. The special value <literal>*</> for server name
>> matches any standby.
>> + By surrounding list of group members using parantheses,
>> synchronous standbys are chosen from
>> + that group using priority method.
>> + </para>
>> s/parantheses/parentheses/ (two occurrences)

Confirmed that this typo doesn't exist in the latest patch.

>> + <sect2 id="dedicated-language-for-multi-sync-replication-priority">
>> + <title>Prioirty Method</title>
>> s/Prioirty Method/Priority Method/

Confirmed that this typo doesn't exist in the latest patch.

> A couple more comments:
> /*
> - * If we aren't managing the highest priority standby then just leave.
> + * If the number of sync standbys is less than requested or we aren't
> + * managing the sync standby then just leave.
> */
> - if (syncWalSnd != MyWalSnd)
> + if (!got_oldest || !am_sync)
> s/ the sync / a sync /


> + /*
> + * Consider all pending standbys as sync if the number of them plus
> + * already-found sync ones is lower than the configuration requests.
> + */
> + if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
> + return list_concat(result, pending);
> The cells from 'pending' will be attached to 'result', and 'result'
> will be freed by the caller. But won't the List header object from
> 'pending' be leaked?

Yes if 'result' is not NIL. I added pfree(pending) for that case.

> + result = lappend_int(result, i);
> + if (list_length(result) == SyncRepConfig->num_sync)
> + {
> + list_free(pending);
> + return result; /* Exit if got enough sync standbys */
> + }
> If we didn't take the early return in the list-not-long-enough case
> mentioned above, we should *always* exit via this return statement,
> right? Since we know that the pending list had enough elements to
> reach num_sync. I think that is worth a comment, and also a "not
> reached" comment at the bottom of the function, if it is true.

Good catch! I added the comments. Also added Assert(false) at
the bottom of the function.

> As a future improvement, I wonder if we could avoid recomputing the
> current set of sync standbys in every walsender every time we call
> SyncRepReleaseWaiters, perhaps by maintaining that set incrementally
> in shmem when walsender states change etc.


> I don't have any other comments, other than to say: thank you to all
> the people who have contributed to this feature so far and I really
> really hope it goes into 9.6!



Fujii Masao

Attachment Content-Type Size
multi_sync_replication_v22.patch text/x-patch 47.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-04-04 08:45:01 Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
Previous Message Simon Riggs 2016-04-04 08:25:41 Re: PATCH: use foreign keys to improve join estimates v1