Re: WAL prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-16 19:02:10
Message-ID: 20180616190210.pqz42a5nxhqy7jw6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>
>
> On 06/15/2018 08:01 PM, Andres Freund wrote:
> > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > >
> > >
> > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > references in WAL records
> > > > > using posix_fadvise(WILLNEED) system call.
> > > > Hi Konstantin,
> > > >
> > > > Why stop at the page cache... what about shared buffers?
> > > >
> > >
> > > It is good question. I thought a lot about prefetching directly to shared
> > > buffers.
> >
> > I think that's definitely how this should work. I'm pretty strongly
> > opposed to a prefetching implementation that doesn't read into s_b.
> >
>
> Could you elaborate why prefetching into s_b is so much better (I'm sure it
> has advantages, but I suppose prefetching into page cache would be much
> easier to implement).

I think there's a number of issues with just issuing prefetch requests
via fadvise etc:

- it leads to guaranteed double buffering, in a way that's just about
guaranteed to *never* be useful. Because we'd only prefetch whenever
there's an upcoming write, there's simply no benefit in the page
staying in the page cache - we'll write out the whole page back to the
OS.
- reading from the page cache is far from free - so you add costs to the
replay process that it doesn't need to do.
- you don't have any sort of completion notification, so you basically
just have to guess how far ahead you want to read. If you read a bit
too much you suddenly get into synchronous blocking land.
- The OS page is actually not particularly scalable to large amounts of
data either. Nor are the decisions what to keep cached likley to be
particularly useful.
- We imo need to add support for direct IO before long, and adding more
and more work to reach feature parity strikes meas a bad move.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-16 19:23:03 Re: WAL prefetch
Previous Message Tom Lane 2018-06-16 19:00:11 Re: GCC 8 warnings