Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2016-03-23 11:34:30
Message-ID: CAEepm=2cvEJveJhNqsNZgWN=S1ZAhMXyATr6fYhibFtpQRrDKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 23, 2016 at 12:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 14, 2016 at 2:38 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> What's with the extra block?
>>
>> Yeah, that's silly, thanks. Tidied up for the next version.
>
> Some more comments on 0001:
>
> + <literal>remote_write</>, <literal> remote_apply</>,
> <literal>local</>, and <literal>off</>.
>
> Extra space.

Fixed.

> + * apply when the transaction eventuallly commits or aborts.
>
> Spelling.

Fixed.

> + if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> + assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_FLUSH, NULL);
> +
> + SyncRepWaitForLSN(gxact->prepare_end_lsn);
> +
> + if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> + assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_APPLY, NULL);
>
> You can't do this. Directly changing the value of a variable that is
> backing a GUC is verboten, and doing it through the thin veneer of
> calling the assign-hook will not avoid the terrible wrath of the
> powers that dwell in the outer dark, and/or Pittsburgh. You probably
> need a dance here similar to the way forcePageWrites/fullPageWrites
> work.

Yeah, that was terrible. Instead of that I have now made this interface change:

-SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
+SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)

If you want to wait for non-commit records (prepare being the only
case of that), you pass false in the second argument, and then the
wait level is capped at remote flush inside that function. There is
no way to wait for non-commit records to be applied and there is no
point in making it so that you can (they don't have any user-visible
effect).

> /*
> + * Check if the caller would like to ask standbys for immediate feedback
> + * once this commit is applied.
> + */
>
> Whitespace.

Fixed.

> + /*
> + * Check if the caller would like to ask standbys for immediate feedback
> + * once this abort is applied.
> + */
>
> Whitespace again.

Fixed.

> /*
> + * doRequestWalReceiverReply is used by recovery code to ask the main recovery
> + * loop to trigger a walreceiver reply.
> + */
> +static bool doRequestWalReceiverReply;
>
> This is the sort of comment that leads me to ask "why even bother
> writing a comment?". Try to say something that's not obvious from the
> variable name. The comment for XLogRequestWalReceiverReply has a
> similar issue.

Changed.

> +static void WalRcvBlockSigUsr2(void)
>
> Style - newline after void.

Fixed.

> +static void WalRcvUnblockSigUsr2(void)
>
> And again here.

Fixed.

> + WalRcvUnblockSigUsr2();
> len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
> + WalRcvBlockSigUsr2();
>
> This does not seem like it will be cheap on all operating systems. I
> think you should try to rejigger this somehow so that it can just set
> the process latch and the wal receiver figures it out from looking at
> shared memory. Like maybe a flag in WalRcvData? An advantage of this
> is that it should cut down on the number of signals significantly,
> because it won't need to send SIGUSR1 when the latch is already set.

Still experimenting with a latch here. I will come back on this point soon.

> + * Although only "on", "off", "remote_apply", "remote_write", and "local" are
> + * documented, we accept all the likely variants of "on" and "off".
>
> Maybe switch to listing the undocumented values.

It follows a pattern used by several nearby bits of code, so it
doesn't look like it should be different, and besides you can see what
the undocumented values are, they're right below.

Here are some test results run on a bunch of Amazon EC2 "m3.large"
under Ubuntu Trusty in the Oregon zone, all in the same subnet.
Defaults except 1GB shared_buffers.

1. Simple sequential updates (using 'test-causal-reads.c', already
posted up-thread):

synchronous_commit TPS
==================== ====
off 9234
local 1223
remote_write 907
on 587
remote_apply 555

causal_reads TPS
==================== ====
0 cr standbys 1112
1 cr standbys 541
2 cr standbys 487
3 cr standbys 467

2. Some pgbench -c4 -j2 -N bench2 runs:

synchronous_commit TPS
==================== ====
off 3937
local 1984
remote_write 1701
on 1373
remote_apply 1349

causal_reads TPS
==================== ====
0 cr standbys 1973
1 cr standbys 1413
2 cr standbys 1282
3 cr standbys 1163

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-remote-apply-v5.patch application/octet-stream 20.9 KB
0002-replay-lag-v5.patch application/octet-stream 25.5 KB
0003-refactor-syncrep-exit-v5.patch application/octet-stream 4.6 KB
0004-causal-reads-v5.patch application/octet-stream 74.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-23 11:39:04 Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Previous Message Emre Hasegeli 2016-03-23 11:27:17 Re: BRIN is missing in multicolumn indexes documentation