Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-03-12 10:25:41
Message-ID: CAEZATCXnQtw-ZytE+rp1Uby9KMs0hWim-unMa-nMdmS5MMgtbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ For the sake of the archives, some of the discussion on the other
thread [1-3] should really have been on this thread. ]

On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> I think the challenge here is in applying the functional dependency
> computed for the whole array to individual elements. I'm not sure we can
> do that.
>
> For example, with a table like this:
>
> CREATE TABLE t (a int, b int[]);
> CREATE STATISTICS s (dependencies) ON a, b FROM t;
>
> Let's say the functional dependency is "perfect" i.e. has strength 1.0.
> But that only tells us dependency for complete array values, we don't
> know how much information we gain by knowledge of subset of the values.
>

The more I think about this example, the more I think this is really
just a special case of the more general problem of compatibility of
clauses. Once you add support for IN (...) clauses, any query of the
form

SELECT ... WHERE (any clauses on col a) AND (any clauses on col b)

can be recast as

SELECT ... WHERE a IN (...) AND b IN (...)

so any counter-example with bad estimates produced with a query in the
first form can also be written in the second form.

I think we should really be thinking in terms of making a strong
functional dependency (a => b) applicable generally to queries in the
first form, which will work well if the clauses on b are compatible
with those on b, but not if they're incompatible. However, that's not
so very different from the current state without extended stats, which
assumes independence, and will return poor estimates if the
columns/clauses aren't independent.

So I'd be tempted to apply a tidied up version of the patch from [3],
and then lift all restrictions from dependency_is_compatible_clause(),
other than the requirement that the clause refer to a single variable.

Regards,
Dean

[1] https://www.postgresql.org/message-id/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20200309181915.5lxhuw2qxoihfoqo%40development
[3] https://www.postgresql.org/message-id/CAEZATCUic8PwhTnexC%2BUx-Z_e5MhWD-8jk%3DJ1MtnVW8TJD%2BVHw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-12 10:31:15 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Richard Guo 2020-03-12 10:22:25 Re: Optimizer Doesn't Push Down Where Expressions on Rollups