Re: New design for FK-based join selectivity estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ronan(dot)dunklau(at)dalibo(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>
Subject: Re: New design for FK-based join selectivity estimation
Date: 2016-12-13 19:44:07
Message-ID: 23545.1481658247@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ronan(dot)dunklau(at)dalibo(dot)com writes:
> On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote:
>> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
>> estimation error :

> The problem is, for semi and anti joins, we assume that we have nohing to do
> (costsize.c:4253):

> else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
> {
> /*
> * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
> * fraction of LHS rows that have matches. If the referenced
> * table is on the inner side, that means the selectivity is 1.0
> * (modulo nulls, which we're ignoring for now). We already
> * covered the other case, so no work here.
> */
> }

> This results in assuming that the whole outerrel will match, no matter the
> selectivity of the innerrel.

Yeah. In the terms of this example, the FK means that every outer row
would have a match, if the query were

select * from t3 where j in (select * from t4);

But actually it's

select * from t3 where j in (select * from t4 where j<10);

so of course we should not expect a match for every row.

> If I understand it correctly and the above is right, I think we should ignore
> SEMI or ANTI joins altogether when considering FKs, and keep the corresponding
> restrictinfos for later processing since they are already special-cased later
> on.

That seems like an overreaction. While the old code happens to get this
example exactly right, eqjoinsel_semi is still full of assumptions and
approximations, and it doesn't do very well at all if it lacks MCV lists
for both sides.

I'm inclined to think that what we want to have happen in this case is
to estimate the fraction of outer rows having a match as equal to the
selectivity of the inner query's WHERE clauses, ie the semijoin
selectivity should be sizeof(inner result) divided by sizeof(inner
relation).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-12-13 20:42:17 Re: Logical Replication WIP
Previous Message Kevin Grittner 2016-12-13 19:19:16 Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]