Re: synchronous_commit = apply

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: synchronous_commit = apply
Date: 2015-09-16 12:50:25
Message-ID: CANP8+j+jCpNoOjc-KQLtt4PDyOX2Sq6wYWqCSy6aaHWkvNa0hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 September 2015 at 20:25, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> Do you think it's reasonable to want to COMMIT a particular transaction on
> a master node, and then immediately run a read-only query on a hot standby
> node that is guaranteed to see that transaction?
>

Yes, that is reasonable and we've been discussing it for a few years now.

> A friend of mine who works with a different RDBMS technology that can do
> that asked me how to achieve this with Postgres, and I suggested waiting
> for the standby's pg_last_xlog_replay_location() to be >= the master's
> pg_current_xlog_location() after COMMIT, which might involve some looping
> and sleeping.
>
> As a quick weekend learning exercise/hack I recently went looking into how
> we could support $SUBJECT. I discovered we already report the apply
> progress back to the master, and the synchronous waiting facility seemed to
> be all ready to support this. In fact it seemed a little too easy so
> something tells me it must be wrong! But anyway, please see the attached
> toy POC patch which does that.
>

As you say, that is the easy part.

> The next problem is that the master can be waiting quite a long time for a
> reply from the remote walreceiver containing the desired apply LSN: in the
> best case it learns of apply progress from replies to subsequent unrelated
> records (which might be very soon on a busy system but still involves
> waiting for the next transaction's WAL flush), and in the worst case it
> needs to wait for wal_receiver_status_interval (10 seconds by default),
> which makes for a long COMMIT delay. I was thinking that the solution to
> that may be to teach StartupLOG to signal the walreceiver after it updates
> XLogCtl->lastReplayedEndRecPtr, which should cause walrcv_receive to be
> interrupted and return early, and then walreceiver could send a reply if it
> sees that lastReplayedEndRecPtr has moved. Maybe that would generate an
> unacceptably high frequency of signals, and maybe there is a better form of
> IPC for this. Without introducing any new IPC, the walreceiver could
> instead simply report apply progress to the master whenever it sees that
> the apply LSN has moved after its regular NAPTIME_PER_CYCLE wait (100ms),
> but that would still introduces bogus latency. A quick and dirty way to
> see that on top of the attached patch is to set requestReply = true in
> WalReceiverMain to force a send after every nap.
>

This problem is exactly why I wrote my recent patch to make WALWriter work
in recovery.

Currently, the WALReceiver issues regular fsyncs that prevent it from
replying in time. Also, the WALReceiver waits on incoming data only, so we
can't (yet) set a latch when the Startup process has applied some records.

I've solved the first problem and know how to solve the second, just
haven't coded it yet. I was expecting to do that for CF3 or CF4.

I don't think we should be using signals, nor would I expect them to work
effectively while in an fsync.

> I can see that using synchronous_commit = apply in the practice might
> prove difficult: how does a client know which node is the synchronous
> standby? Perhaps those sorts of practical problems are the reason no one
> has done or wanted this.
>
It means we need quorum sync rep as well, to make this useful in practice
without sacrificing HA.

Bringing my patch and Beena's patch together will solve this for us in 9.6

So yes, 1) we have thought of it and want it, 2) the basic patch is
trivial, 3) but it isn't the main problem.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-09-16 13:01:26 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Thom Brown 2015-09-16 11:21:32 Re: Sequence Access Method WIP