Re: [HACKERS] Clock with Adaptive Replacement

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Clock with Adaptive Replacement
Date: 2018-04-26 02:40:40
Message-ID: CAH2-WzkG6Fd+D95GZy7Z7H_ORo60X1m3h8hMisO0+wOF+Cskig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 25, 2018 at 6:31 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Huh. Right. So it's not truly uniform. I wonder if any of these
> algorithms are sensitive to the higher value of the leaf pages than
> the heap pages. It seems quite subtle: even though we can see that
> the individual leaf pages must be 6x more likely to be accessed again
> next time than individual heap pages due to their higher tuple
> density, they're still very unlikely to be accessed again soon, and
> the question is whether any of these algorithms can track that for
> long enough to see the difference between two very low access
> frequencies, in a huge field of unlikely-to-be-accessed pages. LRU,
> by not even attempting to model frequency, is I guess uniquely
> affected by the heap-after-index-leaf thing.

Right. Another insight is that it's worth considering weighing the
type of page involved, to artificially favor indexes to some degree.
I'm not saying that it's a good idea, and it's pretty inelegant. But
I'm pretty sure it's been done before, with satisfactory results.

> A thought experiment about the U-shaped performance when your dataset
> fits in neither PG nor kernel cache, but would fit entirely in
> physical memory if you made either of the two caches big enough: I
> suppose when you read a page in, you could tell the kernel that you
> POSIX_FADV_DONTNEED it, and when you steal a clean PG buffer you could
> tell the kernel that you POSIX_FADV_WILLNEED its former contents (in
> advance somehow), on the theory that the coldest stuff in the PG cache
> should now become the hottest stuff in the OS cache. Of course that
> sucks, because the best the kernel can do then is go and read it from
> disk, and the goal is to avoid IO.

Not sure about that. I will say that the intuition that this is a good
area to work on is based on the challenges that we have with
shared_buffers in particular. The fact that shared buffers is
typically sized no larger than 16GB, say, and yet main memory sizes
can now easily be far larger is clearly a problem, and one that we're
going to have to get around to addressing directly.

ISTM that the whole shared_buffers issue should have little influence
on how much we're willing to remember about block popularity; why
should we be willing to track information about only the exact number
of blocks that will fit in shared_buffers at any one time? As you
know, ARC/CAR explicitly models some blocks that are not cache
resident as being within two ghost lists. Sounds like something that
might have a larger-than-expected benefit for us.

How much of a problem is it that we waste memory bandwidth copying to
and from OS cache, particularly on large systems? Might that be the
bigger problem, but also one that can be addressed incrementally?

I think that I may be repeating some of what Andrey said, in another
way (not sure of that).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-04-26 02:54:42 Re: Should we add GUCs to allow partition pruning to be disabled?
Previous Message Craig Ringer 2018-04-26 02:30:51 Re: psql leaks memory on query cancellation