Re: An Idea for planner hints

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: An Idea for planner hints
Date: 2006-08-09 19:14:18
Message-ID: 20060809191418.GM40481@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 09, 2006 at 02:02:10PM +0200, Martijn van Oosterhout wrote:
> On Wed, Aug 09, 2006 at 12:57:39PM +0200, Florian G. Pflug wrote:
> > Fixing the generic problem is surely the best _if_ there is a fix for
> > the generic problem at all. But if your where-conditions involves fields
> > from 10 different tables, then IMHO there is no way to _ever_ guarantee
> > that postgres will get correct selectivity estimates. But since (at
> > least for me) overestimating selectivity hurts fare more than
> > underestimating it, forcing postgres to just assume a certain
> > selectivity could help.
>
> I'm not sure if the problem is totally solvable, but we can certainly
> do a lot better than we do now.
>
> ISTM that what's really missing at the moment is some kind of
> post-mortem analysis that looks at the EXPLAIN ANALYZE output, pulls it
> apart and say: 'look, we went wrong here'. For leaf nodes trying to
> estimate the selectivity on a single table it easy. But working out the
> selectivity of join nodes is harder.
>
> Where we really fall down right now it that we do not recognise highly
> correlated columns. If we have the expression WHERE a = 1 AND b = 2 we
> assume the expressions are independant and multiply the selectivities
> together. Often this is the wrong thing to do.
>
> This also a problem for columns in different tables that get joined on.
> Currently we don't do anything special there either.
>
> Perhaps the way to go would be to allow users to declare columns often
> used together and have ANALYSE collect information on correlation which
> can be used later...

One thing that would help tremendously would be to collect stats on
multi-column indexes. That would probably hit a good chunk of our
problem areas.

Something this is related to is providing estimates for functions (which
has been discussed in the past). There were numerous proposals there,
but the one that stuck in my head was allowing users to define functions
that would provide appropriate stats based on some input. Granted,
that's a pretty low-level construct, but it's more than we have now, and
would allow for better schemes to be built on top of it.

As for query hints, I really wish we'd just bite the bullet and add
them. Yes, they're far from perfect, yes, we should "just fix the
planner", yes, it's ugly that they're per-statement, but ultimately
sometimes you have to just flat-out tell the planner to do things a
certain way. I suspect enough time has been spent debating them since
7.2 that they could have been implemented by now.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2006-08-09 19:26:15 Re: An Idea for planner hints
Previous Message Heikki Linnakangas 2006-08-09 19:04:17 Maintaining cluster order on insert