Re: How does PG know if data is in memory?

From: <gnuoytr(at)rcn(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: How does PG know if data is in memory?
Date: 2010-10-12 03:11:34
Message-ID: 20101011231134.ANB76937@ms14.lnh.mail.rcn.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

An approach that works can be found in DB2, and likely elsewhere.

The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement.

The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes?

Robert

---- Original message ----
>Date: Mon, 11 Oct 2010 22:59:28 -0400
>From: pgsql-performance-owner(at)postgresql(dot)org (on behalf of Robert Haas <robertmhaas(at)gmail(dot)com>)
>Subject: Re: [PERFORM] How does PG know if data is in memory?
>To: Jeremy Harris <jgh(at)wizmail(dot)org>
>Cc: pgsql-performance(at)postgresql(dot)org
>
>On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> On 10/04/2010 04:22 AM, Greg Smith wrote:
>>>
>>> I had a brain-storming session on this subject with a few of the hackers
>>> in the community in this area a while back I haven't had a chance to do
>>> something with yet (it exists only as a pile of scribbled notes so far).
>>> There's a couple of ways to collect data on what's in the database and OS
>>> cache, and a couple of ways to then expose that data to the optimizer. But
>>> that needs to be done very carefully, almost certainly as only a manual
>>> process at first, because something that's producing cache feedback all of
>>> the time will cause plans to change all the time, too. Where I suspect this
>>> is going is that we may end up tracking various statistics over time, then
>>> periodically providing a way to export a mass of "typical % cached" data
>>> back to the optimizer for use in plan cost estimation purposes. But the idea
>>> of monitoring continuously and always planning based on the most recent data
>>> available has some stability issues, both from a "too many unpredictable
>>> plan changes" and a "ba
>>
>> d
>>>
>>> short-term feedback loop" perspective, as mentioned by Tom and Kevin
>>> already.
>>
>> Why not monitor the distribution of response times, rather than "cached" vs.
>> not?
>>
>> That a) avoids the issue of discovering what was a cache hit  b) deals
>> neatly with
>> multilevel caching  c) feeds directly into cost estimation.
>
>I was hot on doing better cache modeling a year or two ago, but the
>elephant in the room is that it's unclear that it solves any
>real-world problem. The OP is clearly having a problem, but there's
>not enough information in his post to say what is actually causing it,
>and it's probably not caching effects. We get occasional complaints
>of the form "the first time I run this query it's slow, and then after
>that it's fast" but, as Craig Ringer pointed out upthread, not too
>many. And even with respect to the complaints we do get, it's far
>from clear that the cure is any better than the disease. Taking
>caching effects into account could easily result in the first
>execution being slightly less slow and all of the subsequent
>executions being moderately slow. That would not be an improvement
>for most people. The reports that seem really painful to me are the
>ones where people with really big machines complain of needing HOURS
>for the cache to warm up, and having the system bogged down to a
>standstill until then. But changing the cost model isn't going to
>help them either.
>
>--
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Whelchel 2010-10-12 03:42:42 Re: Slow count(*) again...
Previous Message Robert Haas 2010-10-12 02:59:28 Re: How does PG know if data is in memory?