Re: New design for FK-based join selectivity estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ronan(dot)dunklau(at)dalibo(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>
Subject: Re: New design for FK-based join selectivity estimation
Date: 2016-12-15 20:51:43
Message-ID: 16149.1481835103@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> ronan(dot)dunklau(at)dalibo(dot)com writes:
>> If I understand it correctly and the above is right, I think we should ignore
>> SEMI or ANTI joins altogether when considering FKs, and keep the corresponding
>> restrictinfos for later processing since they are already special-cased later
>> on.

> That seems like an overreaction. While the old code happens to get this
> example exactly right, eqjoinsel_semi is still full of assumptions and
> approximations, and it doesn't do very well at all if it lacks MCV lists
> for both sides.

> I'm inclined to think that what we want to have happen in this case is
> to estimate the fraction of outer rows having a match as equal to the
> selectivity of the inner query's WHERE clauses, ie the semijoin
> selectivity should be sizeof(inner result) divided by sizeof(inner
> relation).

After further study, I concluded that we can only easily estimate that
when the inner side of the SEMI or ANTI join is just the single referenced
table. If the inner side is itself a join, it's not easy to determine
what fraction of the referenced table will survive the join clauses.

However, we can still be brighter than to just throw all the FK qual
clauses back into the pool: that would result in multiplying their
selectivity estimates together, which for a multi-column FK results in
exactly the drastic underestimation that 100340e2d intended to avoid.
What seems to make sense here is to take the minimum of the per-clause
selectivities, as we are doing for other outer-join cases.

Hence, I propose the attached patch. This rearranges the existing code
slightly to avoid duplicating it.

regards, tom lane

Attachment Content-Type Size
fix-semi-anti-join-selectivity-for-fk.patch text/x-diff 4.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-12-15 20:54:31 Re: Proposal for changes to recovery.conf API
Previous Message Magnus Hagander 2016-12-15 20:20:59 Re: Proposal for changes to recovery.conf API