Re: WIP: cross column correlation ...

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: WIP: cross column correlation ...
Date: 2011-02-25 22:35:45
Message-ID: AANLkTimb9AjbgGZePLx43FjAvM_GL+PL+htJKifn0M51@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/2/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> 2011/2/25 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
>>> All that having been said, I think that while Josh is thinking fuzzily
>>> about the mathematics of his proposal, the basic idea is pretty
>>> sensible.  It is not easy - likely not possible - for the system to
>>> have a good idea which things will be in some kind of cache at the
>>> time the query is executed; it could even change mid-query.  The
>>> execution of one part of the query could evict from the cache data
>>> which some other part of the plan assumed would be cached.  But DBAs
>>> frequently have a very good idea of which stuff is in cache - they can
>>> make observations over a period of time and then adjust settings and
>>> then observe some more and adjust some more.
>>
>> I believe we can maintain a small map of area of a relation  which are
>> in the OS buffer cache (shared buffers move more), or at least a
>> percentage of the relation in OS cache. Getting autovacuum daemon
>> being able to update those maps/counters might be enought and easy to
>> do, it is really near what auto-analyze do.  My observation is that
>> the percentage in cache is stable on a production workload after some
>> tens of minutes needed to warm the server.
>
> I don't think we can assume that will be true in all workloads.
> Imagine a server doing batch processing.  People submit large batches
> of work that take, say, an hour to complete.  Not all batches use the
> same set of tables - maybe they even run in different databases.
> After a big batch process finishes crunching numbers in database A,
> very little of database B will be cached.  But it's not necessarily
> right to assume that when we start queries for a new batch in database
> B, although it's more likely to be right for large tables (which will
> take a long time to get cached meaningfully, if they ever do) than
> small ones.  Also, it could lead to strange issues where batches run
> much faster or slower depending on which batch immediately proceeded
> them.  If we're going to do something a lot of times, it'd be better
> to bite the bullet and read it all in rather than going to more work
> elsewhere, but if we're only going to touch it once, then not so much.
>
> You might also have this issue on systems that run OLTP workloads all
> day and then do some batch processing at night to get ready for the
> next business day.  Kevin Grittner wrote previously about those jobs
> needing some different settings in his environment (I'm not
> remembering which settings at the moment).  Suppose that the batch
> process is going to issue a query that can be planned in one of two
> possible ways.  One way involves reading 10% of a relation, and the
> other way involves reading the whole thing.  The first plan takes 200
> s to execute if the relation is not cached, and 180 s if the relevant
> portion is cached.  The second plan takes 300 s to execute if the
> relation is not cached, and 100 s if it is cached.  At the start of
> the batch run, the relation won't be cached, because it's used *only*
> by the overnight job and not by the daily OLTP traffic.  Which way
> should we execute the query?
>
> The answer is that if the batch job only needs to execute that query
> *once*, we should do it the first way.  But if it needs to execute it
> three or more times, the second way is better, but only if we use the
> second plan every time.  If we start out with the first plan, we're
> always better off sticking with it *unless* we know that we're going
> to repeat the query at least twice more after the iteration we're
> currently planning.  To make the right decision, the query planner
> needs a crystal ball.  Or, a little help from the DBA.

Yes, we are talking of improving some part of the model.
Some workloads are dramatic and need special customization. This is true.

Still there is a path of improvement, and probably it will remain a
path of improvement after the current model is updated.

I am not proposing something to solve all the issues, but way more
interesting IMHO than just letting the dba say : 'this table is in
cache at XX%'.

Btw, pgfincore already do solve the usecase you provide by helping the
DBA to prepare its batch processing, so in some sense I am familiar
with what you describe (take the second plan, pgfincore will preload
in the background, and your query will be done in 100s from the
first).

>
>> What should really help here is to have hooks in the cost functions to
>> test those ideas without the need to patch postgresql-core a lot. Will
>> it be ok to have hooks or will it add to much CPU consumption in a
>> sensible part of the code ?
>
> Depends on where you put them, I guess.  Hooks are pretty cheap, but
> they're also pretty hard to use.

Yes, it will be easier to make an extension, have people testing it
and validate or not the 'new' model

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-02-25 23:41:09 Re: WIP: cross column correlation ...
Previous Message Robert Haas 2011-02-25 22:25:26 Re: disposition of remaining patches