Re: WAL prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-16 19:41:20
Message-ID: 20180616194120.x4gsw2np5jhm7xni@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-06-16 21:34:30 +0200, Tomas Vondra wrote:
> > - it leads to guaranteed double buffering, in a way that's just about
> > guaranteed to *never* be useful. Because we'd only prefetch whenever
> > there's an upcoming write, there's simply no benefit in the page
> > staying in the page cache - we'll write out the whole page back to the
> > OS.
>
> How does reading directly into shared buffers substantially change the
> behavior? The only difference is that we end up with the double
> buffering after performing the write. Which is expected to happen pretty
> quick after the read request.

Random reads directly as a response to a read() request can be cached
differently - and we trivially could force that with another fadvise() -
than posix_fadvise(WILLNEED). There's pretty much no other case - so
far - where we know as clearly that we won't re-read the page until
write as here.

> > - you don't have any sort of completion notification, so you basically
> > just have to guess how far ahead you want to read. If you read a bit
> > too much you suddenly get into synchronous blocking land.
> > - The OS page is actually not particularly scalable to large amounts of
> > data either. Nor are the decisions what to keep cached likley to be
> > particularly useful.
>
> The posix_fadvise approach is not perfect, no doubt about that. But it
> works pretty well for bitmap heap scans, and it's about 13249x better
> (rough estimate) than the current solution (no prefetching).

Sure, but investing in an architecture we know might not live long also
has it's cost. Especially if it's not that complicated to do better.

> My point was that I don't think this actually adds a significant amount
> of work to the direct IO patch, as we already do prefetch for bitmap
> heap scans. So this needs to be written anyway, and I'd expect those two
> places to share most of the code. So where's the additional work?

I think it's largely entirely separate from what we'd do for bitmap
index scans.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darafei Komяpa Praliaskouski 2018-06-16 20:23:17 Re: [HACKERS] GUC for cleanup indexes threshold.
Previous Message Tomas Vondra 2018-06-16 19:34:30 Re: WAL prefetch