2011/2/25 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
>> All that having been said, I think that while Josh is thinking fuzzily
>> about the mathematics of his proposal, the basic idea is pretty
>> sensible. It is not easy - likely not possible - for the system to
>> have a good idea which things will be in some kind of cache at the
>> time the query is executed; it could even change mid-query. The
>> execution of one part of the query could evict from the cache data
>> which some other part of the plan assumed would be cached. But DBAs
>> frequently have a very good idea of which stuff is in cache - they can
>> make observations over a period of time and then adjust settings and
>> then observe some more and adjust some more.
> I believe we can maintain a small map of area of a relation which are
> in the OS buffer cache (shared buffers move more), or at least a
> percentage of the relation in OS cache. Getting autovacuum daemon
> being able to update those maps/counters might be enought and easy to
> do, it is really near what auto-analyze do. My observation is that
> the percentage in cache is stable on a production workload after some
> tens of minutes needed to warm the server.
I don't think we can assume that will be true in all workloads.
Imagine a server doing batch processing. People submit large batches
of work that take, say, an hour to complete. Not all batches use the
same set of tables - maybe they even run in different databases.
After a big batch process finishes crunching numbers in database A,
very little of database B will be cached. But it's not necessarily
right to assume that when we start queries for a new batch in database
B, although it's more likely to be right for large tables (which will
take a long time to get cached meaningfully, if they ever do) than
small ones. Also, it could lead to strange issues where batches run
much faster or slower depending on which batch immediately proceeded
them. If we're going to do something a lot of times, it'd be better
to bite the bullet and read it all in rather than going to more work
elsewhere, but if we're only going to touch it once, then not so much.
You might also have this issue on systems that run OLTP workloads all
day and then do some batch processing at night to get ready for the
next business day. Kevin Grittner wrote previously about those jobs
needing some different settings in his environment (I'm not
remembering which settings at the moment). Suppose that the batch
process is going to issue a query that can be planned in one of two
possible ways. One way involves reading 10% of a relation, and the
other way involves reading the whole thing. The first plan takes 200
s to execute if the relation is not cached, and 180 s if the relevant
portion is cached. The second plan takes 300 s to execute if the
relation is not cached, and 100 s if it is cached. At the start of
the batch run, the relation won't be cached, because it's used *only*
by the overnight job and not by the daily OLTP traffic. Which way
should we execute the query?
The answer is that if the batch job only needs to execute that query
*once*, we should do it the first way. But if it needs to execute it
three or more times, the second way is better, but only if we use the
second plan every time. If we start out with the first plan, we're
always better off sticking with it *unless* we know that we're going
to repeat the query at least twice more after the iteration we're
currently planning. To make the right decision, the query planner
needs a crystal ball. Or, a little help from the DBA.
> What should really help here is to have hooks in the cost functions to
> test those ideas without the need to patch postgresql-core a lot. Will
> it be ok to have hooks or will it add to much CPU consumption in a
> sensible part of the code ?
Depends on where you put them, I guess. Hooks are pretty cheap, but
they're also pretty hard to use.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2011-02-25 18:00:28|
|Subject: Re: wCTE: why not finish sub-updates at the end, not the beginning? |
|Previous:||From: Merlin Moncure||Date: 2011-02-25 17:36:31|
|Subject: Re: wCTE: why not finish sub-updates at the end, not the beginning?|