Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-18 00:29:46
Message-ID: 20200318002946.6dvblukm3cfmgir2@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 17, 2020 at 06:05:17PM +0100, Tomas Vondra wrote:
>On Tue, Mar 17, 2020 at 04:14:26PM +0000, Dean Rasheed wrote:
>>On Tue, 17 Mar 2020 at 15:37, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>>On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:
>>>
>>>>The other thing that I'm still concerned about is the possibility of
>>>>returning estimates with P(a,b) > P(a) or P(b). I think that such a
>>>>thing becomes much more likely with the new types of clause supported
>>>>here, because they now allow multiple values from each column, where
>>>>before we only allowed one. I took another look at the patch I posted
>>>>on the other thread, and I've convinced myself that it's correct.
>>>>Attached is an updated version, with some cosmetic tidying up and now
>>>>with some additional regression tests.
>>>
>>>Yeah, I agree that's something we need to fix. Do you plan to push the
>>>fix, or do you want me to do it?
>>>
>>
>>I can push it. Have you had a chance to review it?
>>
>
>Not yet, I'll take a look today.
>

OK, I took a look. I think from the correctness POV the patch is OK, but
I think the dependencies_clauselist_selectivity() function now got a bit
too complex. I've been able to parse it now, but I'm sure I'll have
trouble in the future :-(

Can we refactor / split it somehow and move bits of the logic to smaller
functions, or something like that?

Another thing I'd like to suggest is keeping the "old" formula, and
instead of just replacing it with

P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

but explaining how the old formula may produce nonsensical selectivity,
and how the new formula addresses that issue.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-03-18 00:31:43 Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING
Previous Message Nikita Glukhov 2020-03-18 00:27:37 Re: [PATCH] Opclass parameters