Re: How does PG know if data is in memory?

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Fabrício dos Anjos Silva <fabricio(dot)silva(at)linkcom(dot)com(dot)br>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How does PG know if data is in memory?
Date: 2010-10-04 20:39:40
Message-ID: AANLkTinwgDn5ss95v6HZ_CmmDMrmLrNUhCFLTTwOYXDK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2010/10/4 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned.  The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now.  So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse.  Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.

* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.

I wonder how that can add value to the planner.

* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.

Extract from postgres code:
* We use an approximation proposed by Mackert and Lohman, "Index Scans
* Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
* on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.

Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.

Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.

>
> --
> Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services and Support  www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeremy Harris 2010-10-04 22:47:16 Re: How does PG know if data is in memory?
Previous Message Scott Marlowe 2010-10-04 19:35:44 Re: [PERFORM] MIT benchmarks pgsql multicore (up to 48)performance