Re: PATCH: add support for IN and @> in functional-dependency statistics use

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: add support for IN and @> in functional-dependency statistics use
Date: 2020-02-01 14:24:46
Message-ID: 20200201142446.aq5zf3gawj3ynrc2@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote:
>Hello
>
>At my current job, we have a lot of multi-tenant databases, thus with tables
>containing a tenant_id column. Such a column introduces a severe bias in
>statistics estimation since any other FK in the next columns is very likely to
>have a functional dependency on the tenant id. We found several queries where
>this functional dependency messed up the estimations so much the optimizer
>chose wrong plans.
>When we tried to use extended statistics with CREATE STATISTIC on tenant_id,
>other_id, we noticed that the current implementation for detecting functional
>dependency lacks two features (at least in our use case):
>- support for IN clauses
>- support for the array contains operator (that could be considered as a
>special case of IN)
>

Thanks for the patch. I don't think the lack of support for these clause
types is an oversight - we haven't done them because we were not quite
sure the functional dependencies can actually apply to them. But maybe
we can support them, I'm not against that in principle.

>After digging in the source code, I think the lack of support for IN clauses
>is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr
>instead of OpExpr. The attached patch fixes this by simply copying the code-
>path for OpExpr and changing the type name. It compiles and the results are
>correct, with a dependency being correctly taken into consideration when
>estimating rows. If you think such a copy paste is bad and should be factored
>in another static bool function, please say so and I will happily provide an
>updated patch.

Hmmm. Consider a query like this:

... WHERE tenant_id = 1 AND another_column IN (2,3)

which kinda contradicts the idea of functional dependencies that knowing
a value in one column, tells us something about a value in a second
column. But that assumes a single value, which is not quite true here.

The above query is essentially the same thing as

... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3))

and also

... WHERE (tenant_id=1 AND another_column=2)
OR (tenant_id=1 AND another_column=3)

at wchich point we could apply functional dependencies - but we'd do it
once for each AND-clause, and then combine the results to compute
selectivity for the OR clause.

But this means that if we apply functional dependencies directly to the
original clause, it'll be inconsistent. Which seems a bit unfortunate.

Or do I get this wrong?

BTW the code added in the 0001 patch is the same as for is_opclause, so
maybe we can simply do

if (is_opclause(rinfo->clause) ||
IsA(rinfo->clause, ScalarArrayOpExpr))
{
...
}

instead of just duplicating the code. We also need some at least some
regression tests, testing functional dependencies with this clause type.

>The lack of support for @> operator, on the other hand, seems to be a decision
>taken when writing the initial code, but I can not find any mathematical nor
>technical reason for it. The current code restricts dependency calculation to
>the = operator, obviously because inequality operators are not going to
>work... but array contains is just several = operators grouped, thus the same
>for the dependency calculation. The second patch refactors the operator check
>in order to also include array contains.
>

No concrete opinion on this yet. I think my concerns are pretty similar
to the IN clause, although I'm not sure what you mean by "this could be
considered as special case of IN".

>I tested the patches on current HEAD, but I can test and provide back-ported
>versions of the patch for other versions if needed (this code path hardly
>changed since its introduction in 10).

I think the chance of this getting backpatched is zero, because it might
easily break existing apps.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-01 15:26:31 Re: widen vacuum buffer counters
Previous Message 曾文旌 (义从) 2020-02-01 13:39:03 Re: [Proposal] Global temporary tables