Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sait Talha Nisanci <Sait(dot)Nisanci(at)microsoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
Date: 2020-08-29 22:14:50
Message-ID: 20200829221450.t7omssadp2i6bbcx@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 27, 2020 at 04:28:54PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > > Hm? At least earlier versions didn't do prefetching for records with an fpw, and only for subsequent records affecting the same or if not in s_b anymore.
>> >
>> > We don't actually read the page when we're replaying an FPW though..?
>> > If we don't read it, and we entirely write the page from the FPW, how is
>> > pre-fetching helping..?
>>
>> Suppose there is a checkpoint. Then we replay a record with an FPW,
>> pre-fetching nothing. Then the buffer gets evicted from
>> shared_buffers, and maybe the OS cache too. Then, before the next
>> checkpoint, we again replay a record for the same page. At this point,
>> pre-fetching should be helpful.
>
>Sure- but if we're talking about 25GB of WAL, on a server that's got
>32GB, then why would those pages end up getting evicted from memory
>entirely? Particularly, enough of them to end up with such a huge
>difference in replay time..
>
>I do agree that if we've got more outstanding WAL between checkpoints
>than the system's got memory then that certainly changes things, but
>that wasn't what I understood the case to be here.
>

I don't think it's very clear how much WAL there actually was in each
case - the message only said there was more than 25GB, but who knows how
many checkpoints that covers? In the cases with FPW=on this may easily
be much less than one checkpoint (because with scale 45GB an update to
every page will log 45GB of full-page images). It'd be interesting to
see some stats from pg_waldump etc.

>> Admittedly, I don't quite understand whether that is what is happening
>> in this test case, or why SDD vs. HDD should make any difference. But
>> there doesn't seem to be any reason why it doesn't make sense in
>> theory.
>
>I agree that this could be a reason, but it doesn't seem to quite fit in
>this particular case given the amount of memory and WAL. I'm suspecting
>that it's something else and I'd very much like to know if it's a
>general "this applies to all (most? a lot of?) SSDs because the
>hardware has a larger than 8KB page size and therefore the kernel has to
>read it", or if it's something odd about this particular system and
>doesn't apply generally.
>

Not sure. I doubt it has anything to do with the hardware page size,
that's mostly transparent to the kernel anyway. But it might be that the
prefetching on a particular SSD has more overhead than what it saves.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-29 22:43:47 Re: list of extended statistics on psql
Previous Message Tomas Vondra 2020-08-29 21:54:58 Re: list of extended statistics on psql