Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-17 15:37:06
Message-ID: 20200317153706.z6pdtijr25iyr424@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:
>On Sat, 14 Mar 2020 at 18:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> I realized there's one more thing that probably needs discussing.
>> Essentially, these two clause types are the same:
>>
>> a IN (1, 2, 3)
>>
>> (a = 1 OR a = 2 OR a = 3)
>>
>> but with 8f321bd1 we only recognize the first one as compatible with
>> functional dependencies. It was always the case that we estimated those
>> two clauses a bit differently, but the differences were usually small.
>> But now that we recognize IN as compatible with dependencies, the
>> difference may be much larger, which bugs me a bit ...
>>
>> So I wonder if we should recognize the special form of an OR clause,
>> with all Vars referencing to the same attribute etc. and treat this as
>> supported by functional dependencies - the attached patch does that.
>> MCV lists there's already no difference because OR clauses are
>> supported.
>>
>
>Makes sense, and the patch looks straightforward enough.
>
>> The question is whether we want to do this, and whether we should also
>> teach the per-column estimates to recognize this special case of IN
>> clause.
>
>I'm not convinced about that second part though. I'd say that
>recognising the OR clause for functional dependencies is sufficient to
>prevent the large differences in estimates relative to the equivalent
>IN clauses. The small differences between the way that OR and IN
>clauses are handled have always been there, and I think that changing
>that is out of scope for this work.
>

Not sure. I think the inconsistency between plan and extended stats may
be a bit surprising, but I agree that issue may be negligible.

>The other thing that I'm still concerned about is the possibility of
>returning estimates with P(a,b) > P(a) or P(b). I think that such a
>thing becomes much more likely with the new types of clause supported
>here, because they now allow multiple values from each column, where
>before we only allowed one. I took another look at the patch I posted
>on the other thread, and I've convinced myself that it's correct.
>Attached is an updated version, with some cosmetic tidying up and now
>with some additional regression tests.
>

Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-17 16:11:16 Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library
Previous Message Julien Rouhaud 2020-03-17 15:31:36 Re: WAL usage calculation patch