Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2022-04-12 20:05:11
Message-ID: CA+hUKGLtn-309+p6pFpR7AdQ+ypx7Zy=LhF6Bardie_bm-pb8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker
<ilmari(at)ilmari(dot)org> wrote:
> Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> writes:
> > This is a nice feature if it is safe to turn off full_page_writes.

As other have said/shown, it does also help if a block with FPW is
evicted and then read back in during one checkpoint cycle, in other
words if the working set is larger than shared buffers.

This also provides infrastructure for proposals in the next cycle, as
part of commitfest #3316:
* in direct I/O mode, I/O stalls become more likely due to lack of
kernel prefetching/double-buffering, so prefetching becomes more
essential
* even in buffered I/O mode when benefiting from free
double-buffering, the copy from kernel buffer to user space buffer can
be finished in the background instead of calling pread() when you need
the page, but you need to start it sooner
* adjacent blocks accessed by nearby records can be merged into a
single scatter-read, for example with preadv() in the background
* repeated buffer lookups, pins, locks (and maybe eventually replay)
to the same page can be consolidated

Pie-in-the-sky ideas:
* someone might eventually want to be able to replay in parallel
(hard, but certainly requires lookahead)
* I sure hope we'll eventually use different techniques for torn-page
protection to avoid the high online costs of FPW

> > When is it safe to do that? On which platform?
> >
> > I am not aware of any released software that allows full_page_writes
> > to be safely disabled. Perhaps something has been released recently
> > that allows this? I think we have substantial documentation about
> > safety of other settings, so we should carefully document things here
> > also.
>
> Our WAL reliability docs claim that ZFS is safe against torn pages:
>
> https://www.postgresql.org/docs/current/wal-reliability.html:
>
> If you have file-system software that prevents partial page writes
> (e.g., ZFS), you can turn off this page imaging by turning off the
> full_page_writes parameter.

Unfortunately, posix_fadvise(WILLNEED) doesn't do anything on ZFS
right now :-(. I have some patches to fix that on Linux[1] and
FreeBSD and it seems like there's a good chance of getting them
committed based on feedback, but it needs some more work on tests and
mmap integration. If anyone's interested in helping get that landed
faster, please ping me off-list.

[1] https://github.com/openzfs/zfs/pull/9807

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-12 20:33:39 Re: make MaxBackends available in _PG_init
Previous Message Nathan Bossart 2022-04-12 19:59:08 Re: make MaxBackends available in _PG_init