Re: Multiple buffer cache?

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexei Vladishev <alexei(dot)vladishev(at)zabbix(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Multiple buffer cache?
Date: 2010-02-08 16:47:57
Message-ID: 4B70403D.3010806@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alvaro Herrera wrote:
> This should be easy to test, no? Just set some variable while running
> latency-critical queries that makes PinBuffer increment usage_count by
> more than one when pinning a buffer. Such a buffer would have its usage
> count typically higher than a buffer only used for regular queries.
>

Yeah, the only problem is that if the latency-critical ones don't happen
often enough, just the constant passes of the clock sweep hand over them
to allocate for the big activity might kick them out regardless. I was
thinking more along the lines of just adding another buffer flag that
protected them instead--once you get in the cache, if you came from a
relation that's pinned, you never leave again if that flag is on. That's
more along the lines of what they think they want--whether or not that's
really best would take some profiling to prove.

> To make this work we'd probably need a slightly larger value of
> BM_MAX_USAGE_COUNT, I think.
>

That alone might help their case significantly, if in fact the
latency-critical bits do happen often. Given a repeatable test case, the
first thing I'd consider doing for this problem is instrumenting the
distribution of data in the buffer cache with usage counts, double
BM_MAX_USAGE_COUNT, then run it again and see what's different. Would
learn a lot with that test.

As far as the sort of issues that Tom and Greg both mentioned, there are
plenty of situations where people are willing to trade-off a significant
amount of average or best performance in return for lowering worst-case
latency for some queries. I'm not completely aligned with presuming the
database will always have enough information to make that call on its
own, and "pinning hints" are common enough in other systems that maybe
they're worth an investigation here too. Some of the things people ask
for optimizer hints for might even be addressed as a side-effect
here--knowing the relation was likely nailed down in the cache would
certainly adjust the plan costs in a way the optimizer could use. That's
a slightly different case than the direct requests for optimizer hints,
which are usually a bad idea because they encourage the optimizer to
make decisions based on data that's likely out of date.

We (myself, Greg Stark, Robert Haas, Stephen Frost) had a little meeting
last year on the topic of more advanced buffer cache methods and what
the optimizer might do with them, and I still owe everyone a written
report on that I haven't forgotten about. While I agree that wandering
in that direction is the ideal approach here, there may be a role for
relation pinning in all this somewhere too.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Janning Vygen 2010-02-08 17:10:08 second concurrent update takes forever
Previous Message Devrim GÜNDÜZ 2010-02-08 16:37:00 Re: [Pgsqlrpms-hackers] weird bug in rebuilding RPMs