Re: Commits don't block for synchronous replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Xin Zhang <xzhang(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Asim Praveen <apraveen(at)pivotal(dot)io>
Subject: Re: Commits don't block for synchronous replication
Date: 2017-09-20 02:07:31
Message-ID: CAD21AoDLU3pK7boR10sTq0uWfzhu49TJtm62uJkx271YTkNQxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2017 at 2:26 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Sep 19, 2017 at 7:50 AM, Xin Zhang <xzhang(at)pivotal(dot)io> wrote:
>> If primary crashed at that moment, and failover to standby, the foo table is
>> lost, even though the replication is synced according to
>> `pg_stat_replication` view.
>
> GUC parameters are reloaded each time a query is run, and so
> SyncRepConfig is filled with the parsed data of SyncRepStandbyNames
> once the parameter is reloaded for the process. Still, here, a commit
> is waiting for a signal from a WAL sender that the wanted LSN has been
> correctly flushed on a standby so this code path does not care about
> the state of SyncRepConfig saved in the context of the process, we
> want to know what the checkpointer thinks about it. Hence using WAL
> sender data or sync_standbys_defined as a source of truth looks like a
> correct concept to me, making the problem of this bug legit.
>

I agree with the analysis and the approach of this patch.

> The check with SyncRepRequested() still holds truth: max_wal_senders
> needs a restart to be updated. Also, the other caller of
> SyncStandbysDefined() requires SyncRepConfig to be set, so this caller
> is fine.

Yeah, after reloaded the config there might be some wal senders that
don't reflect it yet but I think it cannot be a problem.

> I have looked at your patch and tested it, but found no problems
> associated with it. A backpatch would be required, so I have added an
> entry in the next commit fest with status set to "ready for committer"
> so as this bug does not fall into the cracks.

Also I found no problems in the patch.

>
>> A separate question, is the `pg_stat_replication` view the reliable way to
>> find when to failover to a standby, or there are some other ways to ensure
>> the standby is in-sync with the primary?
>
> It shows at SQL level what is currently present in shared memory by
> scanning all the WAL sender entries, so this report uses the same data
> as the backend themselves, so that's a reliable source. In Postgres
> 10, pg_stat_activity is also able to show to users what are the
> backends waiting for a change to be flushed/applied on the standby
> using the wait event called SyncRep. You could make some use of that
> as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-20 02:14:32 Re: sync process names between ps and pg_stat_activity
Previous Message Robert Haas 2017-09-20 02:01:54 Re: Boom filters for hash joins (was: A design for amcheck heapam verification)