Re: WAL prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-15 18:01:42
Message-ID: 20180615180142.jquzzrhy25v3ouuu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
>
>
> On 14.06.2018 09:52, Thomas Munro wrote:
> > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > references in WAL records
> > > using posix_fadvise(WILLNEED) system call.
> > Hi Konstantin,
> >
> > Why stop at the page cache... what about shared buffers?
> >
>
> It is good question. I thought a lot about prefetching directly to shared
> buffers.

I think that's definitely how this should work. I'm pretty strongly
opposed to a prefetching implementation that doesn't read into s_b.

> But the current c'est la vie with Postgres is that allocating too large
> memory for shared buffers is not recommended.
> Due to many different reasons: degradation of clock replacement algorithm,
> "write storm",...

I think a lot of that fear is overplayed. And we've fixed a number of
issues. We don't really generate write storms in the default config
anymore in most scenarios, and if it's an issue you can turn on
backend_flush_after.

> If your system has 1Tb of memory,  almost none of Postgresql administrators
> will recommend to use all this 1Tb for shared buffers.

I've used 1TB successfully.

> Also PostgreSQL is not currently supporting dynamic changing of shared
> buffers size. Without it, the only way of using Postgres in clouds and
> another multiuser systems where system load is not fully controlled by  user
> is to choose relatively small shared buffer size and rely on OS caching.

That seems largely unrelated to the replay case, because there the data
will be read into shared buffers anyway. And it'll be dirtied therein.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-06-15 18:08:25 Removing "Included attributes in B-tree indexes" section from docs
Previous Message Teodor Sigaev 2018-06-15 17:54:41 Re: Speedup of relation deletes during recovery