Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-18 15:55:43
Message-ID: 20200318155543.n72l4lj5kkcklbik@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 17, 2020 at 04:37:06PM +0100, Tomas Vondra wrote:
>On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:
>>On Sat, 14 Mar 2020 at 18:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>>I realized there's one more thing that probably needs discussing.
>>>Essentially, these two clause types are the same:
>>>
>>> a IN (1, 2, 3)
>>>
>>> (a = 1 OR a = 2 OR a = 3)
>>>
>>>but with 8f321bd1 we only recognize the first one as compatible with
>>>functional dependencies. It was always the case that we estimated those
>>>two clauses a bit differently, but the differences were usually small.
>>>But now that we recognize IN as compatible with dependencies, the
>>>difference may be much larger, which bugs me a bit ...
>>>
>>>So I wonder if we should recognize the special form of an OR clause,
>>>with all Vars referencing to the same attribute etc. and treat this as
>>>supported by functional dependencies - the attached patch does that.
>>>MCV lists there's already no difference because OR clauses are
>>>supported.
>>>
>>
>>Makes sense, and the patch looks straightforward enough.
>>
>>>The question is whether we want to do this, and whether we should also
>>>teach the per-column estimates to recognize this special case of IN
>>>clause.
>>
>>I'm not convinced about that second part though. I'd say that
>>recognising the OR clause for functional dependencies is sufficient to
>>prevent the large differences in estimates relative to the equivalent
>>IN clauses. The small differences between the way that OR and IN
>>clauses are handled have always been there, and I think that changing
>>that is out of scope for this work.
>>
>
>Not sure. I think the inconsistency between plan and extended stats may
>be a bit surprising, but I agree that issue may be negligible.
>

OK, I've pushed the change recognizing the special case of OR clauses as
supported by functional dependencies. I've left the estimation of the
clause itself as it's, we can address that in the future if needed.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Palmiotto 2020-03-18 16:07:03 Re: Auxiliary Processes and MyAuxProc
Previous Message Tomas Vondra 2020-03-18 15:53:26 Re: Multivariate MCV list vs. statistics target