Quick Links

Re: How does PG know if data is in memory?

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Fabrício dos Anjos Silva <fabricio(dot)silva(at)linkcom(dot)com(dot)br>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: How does PG know if data is in memory?
Date:	2010-10-04 20:39:40
Message-ID:	AANLkTinwgDn5ss95v6HZ_CmmDMrmLrNUhCFLTTwOYXDK@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

2010/10/4 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned. The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now. So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse. Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.

* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.

I wonder how that can add value to the planner.

* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.

Extract from postgres code:
* We use an approximation proposed by Mackert and Lohman, "Index Scans
* Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
* on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.

Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.

Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.

>
> --
> Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services and Support www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

In response to

Re: How does PG know if data is in memory? at 2010-10-04 03:29:03 from Greg Smith

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeremy Harris	2010-10-04 22:47:16	Re: How does PG know if data is in memory?
Previous Message	Scott Marlowe	2010-10-04 19:35:44	Re: [PERFORM] MIT benchmarks pgsql multicore (up to 48)performance