Re: WAL prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-17 00:00:14
Message-ID: 20180617000014.dpnevksklxrajufg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-16 23:25:34 +0300, Konstantin Knizhnik wrote:
>
>
> On 16.06.2018 22:02, Andres Freund wrote:
> > On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
> > >
> > > On 06/15/2018 08:01 PM, Andres Freund wrote:
> > > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > > > >
> > > > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > > > <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > > > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > > > references in WAL records
> > > > > > > using posix_fadvise(WILLNEED) system call.
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > Why stop at the page cache... what about shared buffers?
> > > > > >
> > > > > It is good question. I thought a lot about prefetching directly to shared
> > > > > buffers.
> > > > I think that's definitely how this should work. I'm pretty strongly
> > > > opposed to a prefetching implementation that doesn't read into s_b.
> > > >
> > > Could you elaborate why prefetching into s_b is so much better (I'm sure it
> > > has advantages, but I suppose prefetching into page cache would be much
> > > easier to implement).
> > I think there's a number of issues with just issuing prefetch requests
> > via fadvise etc:
> >
> > - it leads to guaranteed double buffering, in a way that's just about
> > guaranteed to *never* be useful. Because we'd only prefetch whenever
> > there's an upcoming write, there's simply no benefit in the page
> > staying in the page cache - we'll write out the whole page back to the
> > OS.
>
> Sorry, I do not completely understand this.

> Prefetch is only needed for partial update of a page - in this case we need
> to first read page from the disk

Yes.

> before been able to perform update. So before "we'll write out the whole
> page back to the OS" we have to read this page.
> And if page is in OS cached (prefetched) then is can be done much faster.

Yes.

> Please notice that at the moment of prefetch there is no double
> buffering.

Sure, but as soon as it's read there is.

> As far as page is not accessed before, it is not present in shared buffers.
> And once page is updated,  there is really no need to keep it in shared
> buffers.  We can use cyclic buffers (like in case  of sequential scan or
> bulk update) to prevent throwing away useful pages from shared  buffers by
> redo process. So once again there will no double buffering.

That's a terrible idea. There's a *lot* of spatial locality of further
WAL records arriving for the same blocks.

> I am not so familiar with current implementation of full page writes
> mechanism in Postgres.
> So may be my idea explained below is stupid or already implemented (but I
> failed to find any traces of this).
> Prefetch is needed only for WAL records performing partial update. Full page
> write doesn't require prefetch.
> Full page write has to be performed when the page is update first time after
> checkpoint.
> But what if slightly extend this rule and perform full page write also when
> distance from previous full page write exceeds some delta
> (which somehow related with size of OS cache)?
>
> In this case even if checkpoint interval is larger than OS cache size, we
> still can expect that updated pages are present in OS cache.
> And no WAL prefetch is needed at all!

We could do so, but I suspect the WAL volume penalty would be
prohibitive in many cases. Worthwhile to try though.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-17 00:01:26 Re: WAL prefetch
Previous Message Konstantin Knizhnik 2018-06-16 20:31:49 Re: WAL prefetch