Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-03-18 05:18:44
Message-ID: CA+hUKG+j8F9heWynhBwTpJ3UrZWLaDatzNkamwaWgJ7h54SVEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2020 at 2:47 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> On 2020-Mar-17, Thomas Munro wrote:
> > I didn't want counters that wrap at ~4 billion, but I did want to be
> > able to read and write concurrently without tearing. Instructions
> > like "lock xadd" would provide more guarantees that I don't need,
> > since only one thread is doing all the writing and there's no ordering
> > requirement. It's basically just counter++, but some platforms need a
> > spinlock to perform atomic read and write of 64 bit wide numbers, so
> > more hoop jumping is required.
>
> Ah, I see, you don't want lock xadd ... That's non-obvious. I suppose
> the function could use more commentary on *why* you're doing it that way
> then.

I updated the comment:

+/*
+ * On modern systems this is really just *counter++. On some older systems
+ * there might be more to it, due to inability to read and write 64 bit values
+ * atomically. The counters will only be written to by one process, and there
+ * is no ordering requirement, so there's no point in using higher overhead
+ * pg_atomic_fetch_add_u64().
+ */
+static inline void inc_counter(pg_atomic_uint64 *counter)

> > > Umm, I would keep the return values of both these functions in sync.
> > > It's really strange that PrefetchBuffer does not return
> > > PrefetchBufferResult, don't you think?
> >
> > Agreed, and changed. I suspect that other users of the main
> > PrefetchBuffer() call will eventually want that, to do a better job of
> > keeping the request queue full, for example bitmap heap scan and
> > (hypothetical) btree scan with prefetch.
>
> LGTM.

Here's a new version that changes that part just a bit more, after a
brief chat with Andres about his async I/O plans. It seems clear that
returning an enum isn't very extensible, so I decided to try making
PrefetchBufferResult a struct whose contents can be extended in the
future. In this patch set it's still just used to distinguish 3 cases
(hit, miss, no file), but it's now expressed as a buffer and a flag to
indicate whether I/O was initiated. You could imagine that the second
thing might be replaced by a pointer to an async I/O handle you can
wait on or some other magical thing from the future.

The concept here is that eventually we'll have just one XLogReader for
both read ahead and recovery, and we could attach the prefetch results
to the decoded records, and then recovery would try to use already
looked up buffers to avoid a bit of work (and then recheck). In other
words, the WAL would be decoded only once, and the buffers would
hopefully be looked up only once, so you'd claw back all of the
overheads of this patch. For now that's not done, and the buffer in
the result is only compared with InvalidBuffer to check if there was a
hit or not.

Similar things could be done for bitmap heap scan and btree prefetch
with this interface: their prefetch machinery could hold onto these
results in their block arrays and try to avoid a more expensive
ReadBuffer() call if they already have a buffer (though as before,
there's a small chance it turns out to be the wrong one and they need
to fall back to ReadBuffer()).

> As before, I didn't get to reading 0005 in depth.

Updated to account for the above-mentioned change, and with a couple
of elog() calls changed to ereport().

Attachment Content-Type Size
0001-Allow-PrefetchBuffer-to-be-called-with-a-SMgrRela-v5.patch text/x-patch 7.1 KB
0002-Rename-GetWalRcvWriteRecPtr-to-GetWalRcvFlushRecP-v5.patch text/x-patch 10.5 KB
0003-Add-WalRcvGetWriteRecPtr-new-definition-v5.patch text/x-patch 3.7 KB
0004-Allow-PrefetchBuffer-to-report-what-happened-v5.patch text/x-patch 11.3 KB
0005-Prefetch-referenced-blocks-during-recovery-v5.patch text/x-patch 46.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-03-18 05:58:30 Re: proposal: schema variables
Previous Message p.b uday 2020-03-18 05:11:31 GSoC applicant proposal, Uday PB