Re: Per-table random_page_cost for tables that we know are always cached

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Per-table random_page_cost for tables that we know are always cached
Date: 2008-04-23 05:31:53
Message-ID: Pine.GSO.4.64.0804230111220.22837@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 22 Apr 2008, PFC wrote:

> Example : let's imagine a "cache priority" setting.

Which we can presume the DBA will set incorrectly because the tools needed
to set that right aren't easy to use.

> An alternative would be for the background writer to keep some stats and do
> the thing for us :
> - begin bgwriter scan
> - setup hashtable of [relid => page count]
> - at each page that is scanned, increment "page count" for this
> relation...

I've already got a plan sketched out that does this I didn't manage to get
finished in time for 8.3. What I wanted it for was not for this purpose,
but for instrumentation of what's in the cache that admins can look at.
Right now you can get that out pg_buffercache, but that's kind of
intrusive because of the locks it takes. In many cases I'd be perfectly
happy with an approximation of what's inside the buffer cache, accumulated
while the page header is being locked anyway as the BGW passed over it.
And as you note having this data available can be handy for internal
self-tuning as well once it's out there.

Jim threw out that you can just look at the page hit percentages instead.
That's not completely true. If you've had some nasty query blow out your
buffer cache, or if the server has been up a looong time and the total
stas don't really reflect recent reality, what's in the buffer cache and
what the stats say have been historical cached can diverge.

> This would not examine whatever is in the OS' cache, though.

I don't know that it's too unrealistic to model the OS as just being an
extrapolated bigger version of the buffer cache. I can think of a couple
of ways those can diverge:

1) Popular pages that get high usage counts can end up with a higher
representation in shared_buffers than the OS

2) If you've being doing something like a bulk update, you can have lots
of pages that have been written recently in the OS cache that aren't
really accounted for fully in shared_buffers, because they never get a
high enough usage count to stay there (only used once) but can fill the OS
cache as they're spooled up to write.

I'm not sure that either of these cases are so strong they invalidate your
basic idea though. There's a pending 8.4 TODO to investigate whether
increasing the maximum usage count a buffer can get would be an
improvement. If that number got bumped up I could see (2) become more of
a problem.

I'd be a somewhat concerned about turning this mechanism on by default
though, at least at first. A hybrid approach that gives the DBA some
control might work well. Maybe have an "adjust estimates for cache
contents" knob that you can toggle on a per-session or per-table basis?

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2008-04-23 06:34:02 Re: RECORD.* doesn't work in Pl/PGSQL
Previous Message Joshua D. Drake 2008-04-23 00:50:42 Re: WIP: psql default banner patch