Re: WIP: cross column correlation ...

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: WIP: cross column correlation ...
Date: 2011-02-24 15:07:35
Message-ID: 201102241507.p1OF7ZU25175@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> If you want to take the above as in any way an exhaustive survey of
> >> the landscape (which it isn't), C seems like a standout, maybe
> >> augmented by the making the planner able to notice that A1 = x1 AND A2
> >> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite
> >> queries as much.
> >>
> >> I don't really know how to handle the join selectivity problem. ?I am
> >> not convinced that there is a better solution to that than decorating
> >> the query. ?After all the join selectivity depends not only on the
> >> join clause itself, but also on what you've filtered out of each table
> >> in the meantime.
> >
> > Thinking some more, I think another downside to the "decorate the query"
> > idea is that many queries use constants that are supplied only at
> > runtime, so there would be no way to hard-code a selectivity value into
> > a query when you don't know the value. ?Could a selectivity function
> > handle that?
>
> Beats me. What do you have in mind?

My point is just that many queries have constants who's values are not
known at the time the query is written, so any system should have a way
to handle that somehow. This is why query decoration is usually not a
good solution, and why something more flexible that is stored as part of
the column is preferred.

Perhaps a selectivity function that has easy access to the computed
selectivity of the constant involved might be a win. For example, for
the zip code/state code case we could have something like:

function mysel(zip, state) { pgsel(zip)}

meaning we would still use the selectivities found in the optimizer
statistics (pgsel), but modify them in some way. In the case above, the
selectivity only comes from the zip code. You could also do things like:

function mysel(x, y) { pgsel(x) * pgsel(y) * 0.001}

Such functions have a higher probability of working for all queries
involving that column.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-02-24 15:21:41 Re: Review: Fix snapshot taking inconsistencies
Previous Message Alvaro Herrera 2011-02-24 15:06:17 Re: Review: Fix snapshot taking inconsistencies