Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2016-03-28 07:54:18
Message-ID: CAB7nPqSAAB3yJuF87aFzB2tG9FQ3u+7Carmyi7nPu4HzLtqQnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 27, 2016 at 7:30 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sat, Mar 26, 2016 at 2:48 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Should we worried about potential backward-incompatibilities with the
>> new return values of walrcv_receive?
>
> There are three changes to the walrcv_receive interface:
>
> 1. It now takes a latch pointer, which may be NULL.
>
> 2. If the latch pointer is non-NULL, the existing function might
> return a new sentinel value WALRCV_RECEIVE_LATCH_SET. (The
> pre-existing sentinel value -1 is still in use and has the same value
> and meaning as before, but it now has a name:
> WALRCV_RECEIVE_COPY_ENDED.)
>
> 3. It will no longer return when the process is signalled (a latch
> should be used to ask it to return instead).
>
> Effectively, any client code would need to change at least to add NULL
> or possibly a latch if it needs to ask it to return, and any
> alternative implementation of the WAL receiver interface would need to
> use WaitEventSet (or WaitLatchOrSocket) as its event loop instead of
> whatever it might be using now so that it can detect a latch's state.
> But in any case, any such code would fail to compile against 9.6 due
> to the new argument, and then you'd only be able to get the new return
> value if you asked for it by passing in a latch. What affected code
> are we aware of -- either users of libpqwalreceiver.so or other WAL
> receiver implementations?

Any negative value returned by walrcv_receive would have stopped the
replication stream, not only -1. And as you say, it's actually a good
thing that the interface of walrcv_receive is changing with this
patch, this way compilation would just fail and failures would not
happen discreetly. That's too low-level to be mentioned in the release
notes either way, so just a compilation failure is acceptable to me.

>> Do you have numbers to share regarding how is performing the
>> latch-based approach and the approach that used SIGUSR2 when
>> remote_apply is used?
>
> I couldn't measure any significant change (Linux, all on localhost, 128 cores):
>
> pgbench -c 1 -N bench2 -T 600
>
> 0001-remote-apply-v5.patch (signals), remote_apply -> 449 TPS
> 0001-remote-apply-v6.patch (latches), remote_apply -> 452 TPS
>
> pgbench -c 64 -j 32 -N bench2 -T 600
>
> 0001-remote-apply-v5.patch (signals), remote_apply -> 8536 TPS
> 0001-remote-apply-v6.patch (latches), remote_apply -> 8534 TPS

Which concludes that both imply the same kind of performance. We are
likely seeing noise.

> Incidentally, I also did some testing on what happens when you signal
> a process that is busily writing and fsyncing. I tested a few
> different kernels, write sizes and disk latencies and saw that things
> were fine on all of them up to 10k signals/sec but after that some
> systems' fsync performance started to reduce. Only Linux on Power was
> still happily fsyncing at around 3/4 of full speed when signalled with
> a 2MHz+ tight kill loop (!), while FreeBSD and Linux on Intel weren't
> able to make much progress at all under such adversity. So I suppose
> that if you could get the commit rate up into 5 figures you might be
> able to measure an improvement for the latch version due to
> latch-collapsing, though I noticed a large amount of signal-collapsing
> going on at the OS level on all systems I tested anyway, so maybe it
> wouldn't make a difference. I attach that test program for interest.

Interesting results.

> Also, I updated the comment for the declaration of the latch in
> walreceiver.h to say something about the new usage.
>
> New patch series attached.

static void WalRcvQuickDieHandler(SIGNAL_ARGS);

-
static void
ProcessWalRcvInterrupts(void)
Noise here.

+ ret = WaitLatchOrSocket(latch, events, PQsocket(streamConn), timeout_ms);
+
+ if (ret & WL_POSTMASTER_DEATH)
+ exit(0);
Exiting within libpqwalreceiver.so is no good. I think that even in
the case of a postmaster cleanup we should allow things to be cleaned
up.

/*
+ * Wake up the walreceiver if it happens to be blocked in walrcv_receive,
+ * and tell it that a commit record has been applied.
+ *
+ * This is called by the startup process whenever interesting xlog records
+ * are applied, so that walreceiver can check if it needs to send an apply
+ * notification back to the master which may be waiting in a COMMIT with
+ * synchronous_commit = remote_apply.
+ */
+void
+WalRcvWakeup(void)
+{
+ SetLatch(&WalRcv->latch);
+}
I think here that it would be good to add an assertion telling that
this can just be called by the startup process while in recovery,
WalRcv->latch is not protected by a mutex lock.

+maximum of 'timeout' ms. If a message was successfully read, returns
+its length. Otherwise returns 0 for timeout, WALRCV_RECEIVE_COPY_ENDED
+for disconnection or WALRCV_RECEIVE_LATCH_SET. On success, a pointer
Having an assigned constant name for timeout would be good for
consistency with the rest.

I have been also thinking a lot about this patch, and the fact that
the WAL receiver latch is being used within the internals of
libpqwalreceiver has been bugging me a lot, because this makes the
wait phase happening within the libpqwalreceiver depend on something
that only the WAL receiver had a only control on up to now (among the
things thought: having a second latch for libpqwalreceiver, having an
event interface for libpqwalreceiver, switch libpq_receive into being
asynchronous...). At the end, we need a way to allow the startup
process to let the WAL receiver process know that it needs to be
interrupted via shared memory, and that's the WAL receiver latch, the
removal of epoll stuff cleans up some code at the end. So it seems
that I finally made my mind on 0001 and it looks good to me except the
small things mentioned above.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2016-03-28 07:55:50 Re: Draft release notes for next week's releases
Previous Message Pavan Deolasee 2016-03-28 07:51:41 Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data