Re: Clock sweep not caching enough B-Tree leaf pages?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2015-04-20 18:53:59
Message-ID: 55354B47.8090805@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/20/15 11:11 AM, Robert Haas wrote:
> On Wed, Apr 15, 2015 at 5:06 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> This is my point though (you're right that "flushed" isn't always the
>> same as eviction but that's not the important point here). Right now
>> we only demote when we consider buffers for eviction. But we promote
>> when we pin buffers. Those two things aren't necessarily happening at
>> the same rate and in fact are often orders of magnitude different.
>
> I am absolutely, positively, violently in 100% agreement with this. I
> have made the same point before, but it sure is nice to hear someone
> else thinking about it the same way.

+1

>> What I'm saying is that we should demote a buffer every time we
>> promote a buffer. So every time we pin a buffer we should advance the
>> clock a corresponding amount. I know I'm being intentionally vague
>> about what the corresponding amount is.) The important thing is that
>> the two should be tied together.
>
> Yes, absolutely. If you tilt your head the right way, my proposal of
> limiting the number of promotions per clock sweep has the effect of
> tying buffer demotion and buffer promotion together much more tightly
> than is the case right now. You are limited to 2 promotions per
> demotion; and practically speaking not all buffers eligible to be
> promoted will actually get accessed, so the number of promotions per
> demotion will in reality be somewhere between 0 and 2. Ideally it
> would be exactly 1, but 1 +/- 1 is still a tighter limit than we have
> at present. Which is not to say there isn't some other idea that is
> better still.

I think that would help, but it still leaves user backends trying to
advance the clock, which is quite painful. Has anyone tested running the
clock in the background? We need a wiki page with all the ideas that
have been tested around buffer management...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-04-20 18:59:17 Re: Freeze avoidance of very large table.
Previous Message Bruce Momjian 2015-04-20 18:48:39 Re: Freeze avoidance of very large table.