Re: Clock sweep not caching enough B-Tree leaf pages?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2014-04-16 07:53:07
Message-ID: 20140416075307.GC3906@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It's good to see focus on this - some improvements around s_b are sorely
needed.

On 2014-04-14 10:11:53 -0700, Peter Geoghegan wrote:
> 1) Throttles incrementation of usage_count temporally. It becomes
> impossible to increment usage_count for any given buffer more
> frequently than every 3 seconds, while decrementing usage_count is
> totally unaffected.

I think this is unfortunately completely out of question. For one a
gettimeofday() for every uffer pin will become a significant performance
problem. Even the computation of the xact/stm start/stop timestamps
shows up pretty heavily in profiles today - and they are far less
frequent than buffer pins. And that's on x86 linux, where gettimeofday()
is implemented as something more lightweight than a full syscall.

The other significant problem I see with this is that its not adaptive
to the actual throughput of buffers in s_b. In many cases there's
hundreds of clock cycles through shared buffers in 3 seconds. By only
increasing the usagecount that often you've destroyed the little
semblance to a working LRU there is right now.

It also wouldn't work well for situations with a fast changing
workload >> s_b. If you have frequent queries that take a second or so
and access some data repeatedly (index nodes or whatnot) only increasing
the usagecount once will mean they'll continually fall back to disk access.

> 2) Has usage_count saturate at 10 (i.e. BM_MAX_USAGE_COUNT = 10), not
> 5 as before. ... . This step on its own would be assumed extremely
> counter-productive by those in the know, but I believe that other
> measures ameliorate the downsides. I could be wrong about how true
> that is in other cases, but then the case helped here isn't what you'd
> call a narrow benchmark.

I don't see which mechanisms you have suggested that counter this?

I think having more granular usagecount is a good idea, but I don't
think it can realistically be implemented with the current method of
choosing victim buffers. The amount of cacheline misses around that is
already a major scalability limit; we surely can't make this even
worse. I think it'd be possible to get back to this if we had a better
bgwriter implementation.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2014-04-16 08:54:48 Re: ECPG FETCH readahead
Previous Message Tatsuo Ishii 2014-04-16 07:27:01 Re: Proposal: variant of regclass