Re: Use fadvise in wal replay

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kirill Reshke <reshke(at)double(dot)cloud>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use fadvise in wal replay
Date: 2022-08-07 01:39:34
Message-ID: CALj2ACW_GYJfmUJ9ZDrv4ZJ+Q0FGyN-rC-mo5NFsAXgj4UCygg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 6, 2022 at 10:53 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> Hi Bharath,
>
> thank you for the suggestion.
>
> > On 5 Aug 2022, at 16:02, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Thu, Aug 4, 2022 at 9:48 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >>
> >>> On 18 Jul 2022, at 22:55, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>>
> >>> On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> wrote:
> >
> > I have a fundamental question on the overall idea - How beneficial it
> > will be if the process that's reading the current WAL page only does
> > (at least attempts) the prefetching of future WAL pages? Won't the
> > benefit be higher if "some" other background process does prefetching?
>
> IMO prefetching from other thread would have negative effect.
> fadvise() call is non-blocking, startup process won't do IO. It just informs kernel to schedule asynchronous page read.
> On the other hand synchronization with other process might cost more than fadvise().

Hm, POSIX_FADV_WILLNEED flag makes fadvise() non-blocking.

> Anyway cost of calling fadise() once per 16 page reads is neglectable.

Agree. Why can't we just prefetch the entire WAL file once whenever it
is opened for the first time? Does the OS have any limitations on max
size to prefetch at once? It may sound aggressive, but it avoids
fadvise() system calls, this will be especially useful if there are
many WAL files to recover (crash, PITR or standby recovery),
eventually we would want the total WAL file to be prefetched.

If prefetching the entire WAL file is okay, we could further do this:
1) prefetch in XLogFileOpen() and all of segment_open callbacks, 2)
release in XLogFileClose (it's being dong right now) and all of
segment_close callbacks - do this perhaps optionally.

Also, can't we use an existing function FilePrefetch()? That way,
there is no need for a new wait event type.

Thoughts?

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-08-07 02:12:56 Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Previous Message Andres Freund 2022-08-07 01:29:14 Re: Cleaning up historical portability baggage