From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at> |
Subject: | Re: WIP: cross column correlation ... |
Date: | 2011-02-25 22:35:45 |
Message-ID: | AANLkTimb9AjbgGZePLx43FjAvM_GL+PL+htJKifn0M51@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2011/2/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> 2011/2/25 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
>>> All that having been said, I think that while Josh is thinking fuzzily
>>> about the mathematics of his proposal, the basic idea is pretty
>>> sensible. It is not easy - likely not possible - for the system to
>>> have a good idea which things will be in some kind of cache at the
>>> time the query is executed; it could even change mid-query. The
>>> execution of one part of the query could evict from the cache data
>>> which some other part of the plan assumed would be cached. But DBAs
>>> frequently have a very good idea of which stuff is in cache - they can
>>> make observations over a period of time and then adjust settings and
>>> then observe some more and adjust some more.
>>
>> I believe we can maintain a small map of area of a relation which are
>> in the OS buffer cache (shared buffers move more), or at least a
>> percentage of the relation in OS cache. Getting autovacuum daemon
>> being able to update those maps/counters might be enought and easy to
>> do, it is really near what auto-analyze do. My observation is that
>> the percentage in cache is stable on a production workload after some
>> tens of minutes needed to warm the server.
>
> I don't think we can assume that will be true in all workloads.
> Imagine a server doing batch processing. People submit large batches
> of work that take, say, an hour to complete. Not all batches use the
> same set of tables - maybe they even run in different databases.
> After a big batch process finishes crunching numbers in database A,
> very little of database B will be cached. But it's not necessarily
> right to assume that when we start queries for a new batch in database
> B, although it's more likely to be right for large tables (which will
> take a long time to get cached meaningfully, if they ever do) than
> small ones. Also, it could lead to strange issues where batches run
> much faster or slower depending on which batch immediately proceeded
> them. If we're going to do something a lot of times, it'd be better
> to bite the bullet and read it all in rather than going to more work
> elsewhere, but if we're only going to touch it once, then not so much.
>
> You might also have this issue on systems that run OLTP workloads all
> day and then do some batch processing at night to get ready for the
> next business day. Kevin Grittner wrote previously about those jobs
> needing some different settings in his environment (I'm not
> remembering which settings at the moment). Suppose that the batch
> process is going to issue a query that can be planned in one of two
> possible ways. One way involves reading 10% of a relation, and the
> other way involves reading the whole thing. The first plan takes 200
> s to execute if the relation is not cached, and 180 s if the relevant
> portion is cached. The second plan takes 300 s to execute if the
> relation is not cached, and 100 s if it is cached. At the start of
> the batch run, the relation won't be cached, because it's used *only*
> by the overnight job and not by the daily OLTP traffic. Which way
> should we execute the query?
>
> The answer is that if the batch job only needs to execute that query
> *once*, we should do it the first way. But if it needs to execute it
> three or more times, the second way is better, but only if we use the
> second plan every time. If we start out with the first plan, we're
> always better off sticking with it *unless* we know that we're going
> to repeat the query at least twice more after the iteration we're
> currently planning. To make the right decision, the query planner
> needs a crystal ball. Or, a little help from the DBA.
Yes, we are talking of improving some part of the model.
Some workloads are dramatic and need special customization. This is true.
Still there is a path of improvement, and probably it will remain a
path of improvement after the current model is updated.
I am not proposing something to solve all the issues, but way more
interesting IMHO than just letting the dba say : 'this table is in
cache at XX%'.
Btw, pgfincore already do solve the usecase you provide by helping the
DBA to prepare its batch processing, so in some sense I am familiar
with what you describe (take the second plan, pgfincore will preload
in the background, and your query will be done in 100s from the
first).
>
>> What should really help here is to have hooks in the cost functions to
>> test those ideas without the need to patch postgresql-core a lot. Will
>> it be ok to have hooks or will it add to much CPU consumption in a
>> sensible part of the code ?
>
> Depends on where you put them, I guess. Hooks are pretty cheap, but
> they're also pretty hard to use.
Yes, it will be easier to make an extension, have people testing it
and validate or not the 'new' model
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2011-02-25 23:41:09 | Re: WIP: cross column correlation ... |
Previous Message | Robert Haas | 2011-02-25 22:25:26 | Re: disposition of remaining patches |