Re: proposal : cross-column stats

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-13 00:05:17
Message-ID: AANLkTimVUvrfZyGcV8seuT8Mrq+7xGgdRA4XJ6aRtDJV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2010 at 9:43 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> The way I think of that problem is that once you know the postcode, knowing
> the city name doesn't add any information. The postcode implies the city
> name. So the selectivity for "postcode = ? AND city = ?" should be the
> selectivity of "postcode = ?" alone. The measurement we need is
> "implicativeness": How strongly does column A imply a certain value for
> column B. Perhaps that could be measured by counting the number of distinct
> values of column B for each value of column A, or something like that. I
> don't know what the statisticians call that property, or if there's some
> existing theory on how to measure that from a sample.

This is a good idea, but I guess the question is what you do next. If
you know that the "applicability" is 100%, you can disregard the
restriction clause on the implied column. And if it has no
implicatory power, then you just do what we do now. But what if it
has some intermediate degree of implicability?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-12-13 00:07:14 Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Previous Message Robert Haas 2010-12-12 23:49:18 Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED