Re: Additional improvements to extended statistics

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-03-09 08:35:48
Message-ID: CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH=rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 9 Mar 2020 at 00:02, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Speaking of which, would you take a look at [1]? I think supporting SAOP
> is fine, but I wonder if you agree with my conclusion we can't really
> support inclusion @> as explained in [2].
>

Hmm, I'm not sure. However, thinking about your example in [2] reminds
me of a thought I had a while ago, but then forgot about --- there is
a flaw in the formula used for computing probabilities with functional
dependencies:

P(a,b) = P(a) * [f + (1-f)*P(b)]

because it might return a value that is larger that P(b), which
obviously should not be possible. We should amend that formula to
prevent a result larger than P(b). The obvious way to do that would be
to use:

P(a,b) = Min(P(a) * [f + (1-f)*P(b)], P(b))

but actually I think it would be better and more principled to use:

P(a,b) = f*Min(P(a),P(b)) + (1-f)*P(a)*P(b)

I.e., for those rows believed to be functionally dependent, we use the
minimum probability, and for the rows believed to be independent, we
use the product.

I think that would solve the problem with the example you gave at the
end of [2], but I'm not sure if it helps with the general case.

Regards,
Dean

> [1] https://www.postgresql.org/message-id/flat/13902317(dot)Eha0YfKkKy(at)pierred-pdoc
> [2] https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-03-09 08:38:31 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message Michael Paquier 2020-03-09 08:16:22 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line