Re: Do we need multi-column frequency/histogram stats? WAS Re: Statistics "dependency"

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Do we need multi-column frequency/histogram stats? WAS Re: Statistics "dependency"
Date: 2017-04-23 15:07:46
Message-ID: df4ffd7a-882f-3df5-26a9-a91bff545086@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/23/2017 04:16 PM, Bruce Momjian wrote:
> On Sun, Apr 23, 2017 at 10:01:16AM -0400, Bruce Momjian wrote:
>> On Sun, Apr 23, 2017 at 11:44:12AM +0100, Simon Riggs wrote:
>>> For us "functional dependency" would sound like something to do with
>>> functions (e.g. CREATE FUNCTION), so just "dependency" appears to me
>>> to be the best term for this.
>>>
>>> There are multiple statistics for dependency stored, hence
>>> "dependencies". I don't like it, but its the best term I can see at
>>> present.
>>
>> OK, thank you for the reply, and I am sorry I forgot the previous
>> discussion. I just wanted to re-check we had research this. Thanks.
>
> (Email subject updated.)
>
> Actually, I have a larger question that I was thinking about. Because
> we already have lots of per-column stats, and now the dependency score,
> is it possible to mix the per-column stats and dependency score in a way
> that multi-column frequency/histogram stats are not necessary? That
> might be a less costly approach I had not considered.

Certainly not. Functional dependencies are "global" statistics, and only
a very specific type of it. It only tells you that a particular column
"implies" another column, i.e. knowledge of a value in A means there's
only a single possible value in "B". That has a number of implications:

* It only works for equality conditions. No inequalities or so.

* It assumes the queries are "consistent" with the functional dependencies.

* There are dependencies/correlations that are not functional
dependencies, while MCV/histograms would help.

Functional dependencies was the simplest type of extended statistics,
and so it was the first one to implement (and introduce all the
infrastructure). But we still need the other types.

The other types of statistics actually track correlation between values
in the columns, not just "column A implies column B".

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-04-23 15:41:08 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Tomas Vondra 2017-04-23 15:00:36 Re: Statistics "dependency"