Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-12 17:30:47
Message-ID: 20200312173047.stp55sxoj4sipn7q@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 12, 2020 at 10:25:41AM +0000, Dean Rasheed wrote:
>[ For the sake of the archives, some of the discussion on the other
>thread [1-3] should really have been on this thread. ]
>
>On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> I think the challenge here is in applying the functional dependency
>> computed for the whole array to individual elements. I'm not sure we can
>> do that.
>>
>> For example, with a table like this:
>>
>> CREATE TABLE t (a int, b int[]);
>> CREATE STATISTICS s (dependencies) ON a, b FROM t;
>>
>> Let's say the functional dependency is "perfect" i.e. has strength 1.0.
>> But that only tells us dependency for complete array values, we don't
>> know how much information we gain by knowledge of subset of the values.
>>
>
>The more I think about this example, the more I think this is really
>just a special case of the more general problem of compatibility of
>clauses. Once you add support for IN (...) clauses, any query of the
>form
>
> SELECT ... WHERE (any clauses on col a) AND (any clauses on col b)
>
>can be recast as
>
> SELECT ... WHERE a IN (...) AND b IN (...)
>
>so any counter-example with bad estimates produced with a query in the
>first form can also be written in the second form.
>
>I think we should really be thinking in terms of making a strong
>functional dependency (a => b) applicable generally to queries in the
>first form, which will work well if the clauses on b are compatible
>with those on b, but not if they're incompatible. However, that's not
>so very different from the current state without extended stats, which
>assumes independence, and will return poor estimates if the
>columns/clauses aren't independent.
>

I'm sorry, but I don't see how we could do this for arbitrary clauses. I
think we could do that for clauses that have equality semantics and
reference column values as a whole. So I think it's possible to do this
for IN clauses (which is what the first part of the patch does), but I
don't think we can do it for the containment operator.

I.e. we can do that for

WHERE a IN (...) AND b IN (...)

but I don't see how we could do that for

WHERE a @> (...) AND b @> (...)

I don't think the dependency degree gives us any reliable insight into
statistical dependency of elements of the values.

Or maybe we're just talking about different things? You seem to be
talking abotu IN clauses (which I think is doable), but my question was
about using functional dependencies to estimate array containment
clauses (which I think is not really doable).

>So I'd be tempted to apply a tidied up version of the patch from [3],
>and then lift all restrictions from dependency_is_compatible_clause(),
>other than the requirement that the clause refer to a single variable.
>

I haven't looked at the patch from [3] closely yet, but you're right

P(A & B) <= Min(P(A), P(B))

and the approach you proposed seems reasonable. I don't think how we
can just remove all the restriction on clause type - the restriction
that dependencies only handle equality-like clauses seems pretty much
baked into the dependencies.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-12 18:36:33 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Tom Lane 2020-03-12 17:19:56 Re: ALTER tbl rewrite loses CLUSTER ON index