Re: Clock sweep not caching enough B-Tree leaf pages?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2015-04-14 21:25:31
Message-ID: CA+TgmoZs6p3WZan8XisdL9s7pu6DxpX6cyHJBBeCVAKsM1FW7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 16, 2014 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> Anyways, I'm still curious if you can post similar numbers basing the
>> throttling on gross allocation counts instead of time. Meaning: some
>> number of buffer allocations has to have occurred before you consider
>> eviction. Besides being faster I think it's a better implementation:
>> an intermittently loaded server will give more consistent behavior.
>
> Yeah --- I think wall-clock-based throttling is fundamentally the wrong
> thing anyway. Are we going to start needing a CPU speed measurement to
> tune the algorithm with? Not the place to be going. But driving it off
> the number of allocations that've been done could be sensible. (OTOH,
> that means you need a central counter, which itself would be a
> bottleneck.)

So, I was thinking about this a little bit more today, prodded by my
coworker John Gorman. I'm wondering if we could drive this off of the
clock sweep; that is, every time the clock sweep touches a buffer, its
usage count goes down by one, but we also set two flag bits. Every
time the buffer gets touched thereafter, we check whether any flag
bits are set; if so, we clear one and increase the usage count, else
we do nothing. So the usage count can increase at most twice per
clock sweep. The advantage of that is that, as with Peter's approach,
it is harder for the usage count of a buffer to max out - to get
there, you need sustained access over a longer period of time. But
it's not time-dependent, so replaying the same workload at twice the
speed or half the speed on faster or slower hardware doesn't change
the choice of which buffer to evict, which seems good. And it will
cause us to prefer buffers which are accessed repeatedly over a period
of time rather than buffers that are accessed a bunch of times in
quick succession and then not touched again for a while, which seems
like a good bet.

I can't convince myself that this fully solves the (currently
existing) problem of usage counts increasing too fast. In theory,
every time the clock sweep hand advances, it decrements the usage
count of one buffer by one, but also makes it possible for the usage
count of that buffer to increase by up to two (or by only one if the
buffer previously had a usage count of five). So with the right
access pattern, it seems like you could still get to a place where a
typical buffer eviction has to do a lot of scanning before it finds a
victim buffer. As with the present algorithm, we'd be most likely to
have that problem when the buffer hit ratio is very high, so that
there are many opportunities for usage counts to go up between each
opportunity to push usage counts back down. But it seems like it
would at least limit the damage.

I haven't tried this or anything, so this is just random brainstorming.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-04-14 21:43:54 Re: logical column ordering
Previous Message Alvaro Herrera 2015-04-14 20:29:36 Re: Make more portable TAP tests of initdb