Re: WAL prefetch

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-16 20:25:34
Message-ID: 84da6d18-6034-4271-7c0c-b68a9947f7cf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.06.2018 22:02, Andres Freund wrote:
> On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>>
>> On 06/15/2018 08:01 PM, Andres Freund wrote:
>>> On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
>>>>
>>>> On 14.06.2018 09:52, Thomas Munro wrote:
>>>>> On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
>>>>> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>>>>> pg_wal_prefetch function will infinitely traverse WAL and prefetch block
>>>>>> references in WAL records
>>>>>> using posix_fadvise(WILLNEED) system call.
>>>>> Hi Konstantin,
>>>>>
>>>>> Why stop at the page cache... what about shared buffers?
>>>>>
>>>> It is good question. I thought a lot about prefetching directly to shared
>>>> buffers.
>>> I think that's definitely how this should work. I'm pretty strongly
>>> opposed to a prefetching implementation that doesn't read into s_b.
>>>
>> Could you elaborate why prefetching into s_b is so much better (I'm sure it
>> has advantages, but I suppose prefetching into page cache would be much
>> easier to implement).
> I think there's a number of issues with just issuing prefetch requests
> via fadvise etc:
>
> - it leads to guaranteed double buffering, in a way that's just about
> guaranteed to *never* be useful. Because we'd only prefetch whenever
> there's an upcoming write, there's simply no benefit in the page
> staying in the page cache - we'll write out the whole page back to the
> OS.

Sorry, I do not completely understand this.
Prefetch is only needed for partial update of a page - in this case we
need to first read page from the disk
before been able to perform update. So before "we'll write out the whole
page back to the OS" we have to read this page.
And if page is in OS cached (prefetched) then is can be done much faster.

Please notice that at the moment of prefetch there is no double
buffering. As far as page is not accessed before, it is not present in
shared buffers. And once page is updated,  there is really no need to
keep it in shared buffers.  We can use cyclic buffers (like in case  of
sequential scan or bulk update) to prevent throwing away useful pages
from shared  buffers by redo process. So once again there will no double
buffering.
> - reading from the page cache is far from free - so you add costs to the
> replay process that it doesn't need to do.
> - you don't have any sort of completion notification, so you basically
> just have to guess how far ahead you want to read. If you read a bit
> too much you suddenly get into synchronous blocking land.
> - The OS page is actually not particularly scalable to large amounts of
> data either. Nor are the decisions what to keep cached likley to be
> particularly useful.
> - We imo need to add support for direct IO before long, and adding more
> and more work to reach feature parity strikes meas a bad move.

I am not so familiar with current implementation of full page writes
mechanism in Postgres.
So may be my idea explained below is stupid or already implemented (but
I failed to find any traces of this).
Prefetch is needed only for WAL records performing partial update. Full
page write doesn't require prefetch.
Full page write has to be performed when the page is update first time
after checkpoint.
But what if slightly extend this rule and perform full page write also
when distance from previous full page write exceeds some delta
(which somehow related with size of OS cache)?

In this case even if checkpoint interval is larger than OS cache size,
we still can expect that updated pages are present in OS cache.
And no WAL prefetch is needed at all!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-06-16 20:31:49 Re: WAL prefetch
Previous Message Darafei Komяpa Praliaskouski 2018-06-16 20:23:17 Re: [HACKERS] GUC for cleanup indexes threshold.