Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Sait Talha Nisanci <Sait(dot)Nisanci(at)microsoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
Date: 2020-08-27 20:28:54
Message-ID: 20200827202853.GT29590@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > Hm? At least earlier versions didn't do prefetching for records with an fpw, and only for subsequent records affecting the same or if not in s_b anymore.
> >
> > We don't actually read the page when we're replaying an FPW though..?
> > If we don't read it, and we entirely write the page from the FPW, how is
> > pre-fetching helping..?
>
> Suppose there is a checkpoint. Then we replay a record with an FPW,
> pre-fetching nothing. Then the buffer gets evicted from
> shared_buffers, and maybe the OS cache too. Then, before the next
> checkpoint, we again replay a record for the same page. At this point,
> pre-fetching should be helpful.

Sure- but if we're talking about 25GB of WAL, on a server that's got
32GB, then why would those pages end up getting evicted from memory
entirely? Particularly, enough of them to end up with such a huge
difference in replay time..

I do agree that if we've got more outstanding WAL between checkpoints
than the system's got memory then that certainly changes things, but
that wasn't what I understood the case to be here.

> Admittedly, I don't quite understand whether that is what is happening
> in this test case, or why SDD vs. HDD should make any difference. But
> there doesn't seem to be any reason why it doesn't make sense in
> theory.

I agree that this could be a reason, but it doesn't seem to quite fit in
this particular case given the amount of memory and WAL. I'm suspecting
that it's something else and I'd very much like to know if it's a
general "this applies to all (most? a lot of?) SSDs because the
hardware has a larger than 8KB page size and therefore the kernel has to
read it", or if it's something odd about this particular system and
doesn't apply generally.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-08-27 20:41:31 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Jeff Janes 2020-08-27 20:20:30 Re: Autovac cancellation is broken in v14