Re: A query planner that learns

From: AgentM <agentm(at)themactionfaction(dot)com>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: A query planner that learns
Date: 2006-10-13 15:53:15
Message-ID: 1B7FAD85-20E9-40AE-8018-724F94E132F8@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Oct 13, 2006, at 11:47 , John D. Burger wrote:

> Erik Jones wrote:
>
>> Forgive me if I'm way off here as I'm not all that familiar with
>> the internals of postgres, but isn't this what the genetic query
>> optimizer discussed the one of the manual's appendixes is supposed
>> to do.
>
> No - it's not an "optimizer" in that sense. When there are a small
> enough set of tables involved, the planner uses a dynamic
> programming algorithm to explore the entire space of all possible
> plans. But the space grows exponentially (I think) with the number
> of tables - when this would take too long, the planner switches to
> a genetic algorithm approach, which explores a small fraction of
> the plan space, in a guided manner.
>
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each
> candidate plan, and these statistics are based on sampling your
> data - they may be wrong, or at least misleading. (In particular,
> the statistic for total number of unique values is frequently =way=
> off, per a recent thread here. I have been reading about this,
> idly thinking about how to improve the estimate.)
>
> The idea of a learning planner, I suppose, would be one that
> examines cases where these statistics lead to very misguided
> expectations. The simplest version of a "learning" planner could
> simply bump up the statistics targets on certain columns. A
> slightly more sophisticated idea would be for some of the
> statistics to optionally use parametric modeling (this column is a
> Gaussian, let's estimate the mean and variance, this one is a Beta
> distribution ...). Then the smarter planner could spend some
> cycles applying more sophisticated statistical modeling to
> problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.

Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.

-M

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-10-13 16:04:02 Re: UTF-8
Previous Message Harpreet Dhaliwal 2006-10-13 15:51:06 Server Added Y'day. Missing Today