2011/2/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Josh Berkus wrote:
>>> On 2/23/11 7:10 AM, Robert Haas wrote:
>>> > IME, most bad query plans are caused by either incorrect
>>> > estimates of selectivity, or wrongheaded notions about what's likely
>>> > to be cached. If we could find a way, automated or manual, of
>>> > providing the planner some better information about the facts of life
>>> > in those areas, I think we'd be way better off. I'm open to ideas
>>> > about what the best way to do that is.
>>> As previously discussed, I'm fine with approaches which involve
>>> modifying database objects. These are auditable and centrally managable
>>> and aren't devastating to upgrades.
>>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
>>> that decorating queries would not.
>>> Similiarly, I would love to be able to set "cache %" on a per-relation
>>> basis, and override the whole dubious calculation involving
>>> random_page_cost for scans of that table.
>> We should just fine a way of checking what percentage of a table is
>> already in the shared buffers. That doesn't help us with the kernel
>> cache, but it would be a good start and something that doesn't require
>> user tuning.
> You're reinventing a wheel that's already been discarded multiple
> times. There are at least four separate problems:
> 1. The percentage of the table which is cached in shared_buffers at
> plan time need not match the percentage that is cached at execution
> time. A delay of even a few seconds between planning and execution
> could make the numbers totally different, and plans can be cached for
> much longer than that.
> 2. Because shared_buffers can turn over quite quickly, planning the
> statement multiple times in relatively quick succession could give
> different results each time. Previous discussions on this topic have
> concluded that DBAs hate plan instability, and hate GEQO because it
> causes plan instability, and this would inject plan instabiilty into
> the main planner.
> 3. The percentage of the table which is cached in shared_buffers is
> not necessarily representative of the percentage which is cached in
> general. On a large machine, shared_buffers may be less than 10% of
> the total cache. It would be unwise to make guesses about what is and
> is not cached based on a small percentage of the cache.
> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then? For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached. How much of the data that the user asked for is cache?
> Hard to say, right? It could be none of it or all of it. The second
> scenario is easy to imagine - just suppose the query's been executed
> twice. The first scenario isn't hard to imagine either.
> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached. Consider a system with 8GB of RAM,
> and a table which is 64kB. It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached. If
> it isn't before the query executes, it soon will be. Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea. Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.
Yes, good idea.
> All that having been said, I think that while Josh is thinking fuzzily
> about the mathematics of his proposal, the basic idea is pretty
> sensible. It is not easy - likely not possible - for the system to
> have a good idea which things will be in some kind of cache at the
> time the query is executed; it could even change mid-query. The
> execution of one part of the query could evict from the cache data
> which some other part of the plan assumed would be cached. But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.
I believe we can maintain a small map of area of a relation which are
in the OS buffer cache (shared buffers move more), or at least a
percentage of the relation in OS cache. Getting autovacuum daemon
being able to update those maps/counters might be enought and easy to
do, it is really near what auto-analyze do. My observation is that
the percentage in cache is stable on a production workload after some
tens of minutes needed to warm the server.
What should really help here is to have hooks in the cost functions to
test those ideas without the need to patch postgresql-core a lot. Will
it be ok to have hooks or will it add to much CPU consumption in a
sensible part of the code ?
> PostgreSQL is extremely easy to administer compared with some of its
> competitors, and it's frequently necessary to change very little. But
> there's a difference between what you absolutely have to change to
> make it work and what you have the option to change when necessary.
> We need to decrease the amount of stuff in the first category (as we
> recently did with wal_buffers) and increase the amount of stuff in the
> second category. People coming from Oracle are not favorably
> impressed either by the amount of monitoring data PostgreSQL can
> gather or by the number of knobs that are available to fix problems
> when they occur. We don't need to have as many knobs as Oracle and we
> probably don't want to, and for that matter we probably couldn't if we
> did want to for lack of manpower, but that doesn't mean we should have
> Maybe sometime during my life someone will invent a self-driving car
> where I can just get in and say "take me to Bruce's house" and an hour
> later it'll parallel park at the end of his driveway. That will be
> great. But I think that the first generation of self-driving cars
> will still have a steering wheel, and a brake pedal, and a little
> switch that turns self-driving mode OFF. It is one thing to say that
> you have a system which is really good and does not need much manual
> adjustment, and we have such a system. It is another thing altogether
> to systematically remove, or refuse to add, any controls that might
> permit adjustment in cases where it is necessary. That can be the
> right thing to do if the system is of such high quality that such
> manual adjustment is so unlikely to be necessary as not to be worth
> worrying about; but we are not at that point. And frankly I think if
> we don't add some knobs to let this stuff be tuned manually, we will
> never get the experience we need to write good auto-tuning algorithms.
> Greg Smith would not have known what algorithm to propose for tuning
> the wal_buffers option if he had not had a bunch of experience setting
> it by hand.
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
In response to
pgsql-hackers by date
|Next:||From: Rod Taylor||Date: 2011-02-25 17:03:58|
|Subject: Re: WIP: cross column correlation ...|
|Previous:||From: Greg Smith||Date: 2011-02-25 16:40:27|
|Subject: Re: sync rep design architecture (was "disposition of remaining patches")|