Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-25 00:28:01
Message-ID: 20200325002801.i45f34l45rqmik4v@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 19, 2020 at 07:53:39PM +0000, Dean Rasheed wrote:
>On Wed, 18 Mar 2020 at 00:29, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> OK, I took a look. I think from the correctness POV the patch is OK, but
>> I think the dependencies_clauselist_selectivity() function now got a bit
>> too complex. I've been able to parse it now, but I'm sure I'll have
>> trouble in the future :-(
>>
>> Can we refactor / split it somehow and move bits of the logic to smaller
>> functions, or something like that?
>>
>
>Yeah, it has gotten a bit long. It's somewhat tricky splitting it up,
>because of the number of shared variables used throughout the
>function, but here's an updated patch splitting it into what seemed
>like the 2 most logical pieces. The first piece (still in
>dependencies_clauselist_selectivity()) works out what dependencies
>can/should be applied, and the second piece in a new function does the
>actual work of applying the list of functional dependencies to the
>clause list.
>
>I think that has made it easier to follow, and it has also reduced the
>complexity of the final "no applicable stats" branch.
>

Seems OK to me.

I'd perhaps name deps_clauselist_selectivity differently, it's a bit too
similar to dependencies_clauselist_selectivity. Perhaps something like
clauselist_apply_dependencies? But that's a minor detail.

>> Another thing I'd like to suggest is keeping the "old" formula, and
>> instead of just replacing it with
>>
>> P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
>>
>> but explaining how the old formula may produce nonsensical selectivity,
>> and how the new formula addresses that issue.
>>
>
>I think this is purely a comment issue? I've added some more extensive
>comments attempting to justify the formulae.
>

Yes, it was purely a comment issue. Seems fine now.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-03-25 00:39:29 Re: Index Skip Scan
Previous Message James Coleman 2020-03-25 00:23:44 Re: [PATCH] Incremental sort (was: PoC: Partial sort)