Re: Regression in join selectivity estimations when using foreign keys

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Regression in join selectivity estimations when using foreign keys
Date: 2017-05-20 19:56:01
Message-ID: 25194.1495310161@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> I've been analyzing a reported regression case between a 9.5 plan and
> a 9.6 plan. I tracked this down to the foreign key join selectivity
> code, specifically the use_smallest_selectivity code which is applied
> to outer joins where the referenced table is on the outer side of the
> join.
> ...
> I've attached fixes, based on master, for both of these cases.

I'm entirely unconvinced by this patch --- it seems to simply be throwing
away a lot of logic. Notably it lobotomizes the FK code altogether for
semi/antijoin cases, but you've not shown any example that even involves
such joins, so what's the argument for doing that? Also, the reason
we had the use_smallest_selectivity code in the first place was that we
didn't believe the FK-based selectivity could be applied safely to
outer-join cases, so simply deciding that it's OK to apply it after all
seems insufficiently justified.

Or in short, exhibiting one counterexample to the existing code is not
a sufficient argument for changing things this much. You need to give
an argument why this is the right thing to do instead.

Stepping back a bit, it seems like the core thing the FK selectivity code
was meant to do was to prevent us from underestimating selectivity of
multiple-clause join conditions through a false assumption of clause
independence. The problem with the use_smallest_selectivity code is that
it's overestimating selectivity, but that doesn't mean that we want to go
all the way back to the old way of doing things. I wonder if we could get
any traction in these dubious cases by computing the product, instead of
minimum, of the clause selectivities (thus matching the old estimate) and
then taking the greater of that and the FK-based selectivity.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-05-20 20:26:08 Re: proposal psql \gdesc
Previous Message David G. Johnston 2017-05-20 18:51:54 Re: Allowing dash character in LTREE