From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Fabrício dos Anjos Silva <fabricio(dot)silva(at)linkcom(dot)com(dot)br>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: How does PG know if data is in memory? |
Date: | 2010-10-04 20:39:40 |
Message-ID: | AANLkTinwgDn5ss95v6HZ_CmmDMrmLrNUhCFLTTwOYXDK@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
2010/10/4 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned. The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now. So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse. Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.
* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.
I wonder how that can add value to the planner.
* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.
Extract from postgres code:
* We use an approximation proposed by Mackert and Lohman, "Index Scans
* Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
* on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.
Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.
Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.
>
> --
> Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services and Support www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Harris | 2010-10-04 22:47:16 | Re: How does PG know if data is in memory? |
Previous Message | Scott Marlowe | 2010-10-04 19:35:44 | Re: [PERFORM] MIT benchmarks pgsql multicore (up to 48)performance |