Patch to fix FK-related selectivity estimates with constants

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Patch to fix FK-related selectivity estimates with constants
Date: 2020-10-27 03:47:37
Message-ID: 1019549.1603770457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the thread at [1] it's discussed how our code for making
selectivity estimates using knowledge about FOREIGN KEY constraints
is busted in the face of EquivalenceClasses including constants.

That is, if fktab(a,b) is a 2-column FK reference to pktab(a,b)
and we have a query like

... where fktab.a = pktab.a and fktab.b = pktab.b

then we understand that any given fktab row can match at most one
pktab row (and this estimate is often a lot better than we'd get
from assuming that the a and b conditions are independent).

However, if the query is like

... where fktab.a = pktab.a and fktab.b = pktab.b
and fktab.a = 1

then this suddenly breaks down and we go back to non-FK-aware
estimates. The reason is that get_foreign_key_join_selectivity()
is looking for join clauses that equate the two sides of the FK
constraint ... and in this example, it will not see any such
join clause for column "a". That's because equivclass.c decided
to replace the given clauses with "fktab.a = 1 and pktab.a = 1",
which can be enforced at the scan level, leaving nothing to be
done for column "a" at the join level.

We can fix that by detecting which EquivalenceClasses are marked
"ec_has_const", since that's the property that dictates whether
equivclass.c uses this strategy. However, that's only a partial
fix; if you try it, you soon find that the selectivity estimates
are still off. The reason is that because the two "a = 1" conditions
are already factored into the size estimates for the join input
relations, we're essentially double-counting the "fktab.a = 1"
condition's selectivity if we use the existing FK selectivity
estimation rule. If we treated the constant condition as only
relevant to the PK side, then the FK selectivity rule could work
normally. But we don't want to drop the ability to enforce the
restriction at the scan level. So what we have to do is cancel
the FK side's condition's selectivity out of the FK selectivity.

Attached is a patch series that attacks it that way. For ease of
review I split it into two steps:

0001 refactors process_implied_equality() so that it can pass
back the new RestrictInfo to its callers in equivclass.c.
I think this is a good change on its own merits, because it means
that when generating a derived equality, we don't have to use
initialize_mergeclause_eclasses() to set up the new RestrictInfo's
left_ec and right_ec pointers. The equivclass.c caller knows
perfectly darn well which EquivalenceClass the two sides of the
clause belong to, so it can just assign that value, saving a couple
of potentially-not-cheap get_eclass_for_sort_expr() searches.
This does require process_implied_equality() to duplicate some of
the steps in distribute_qual_to_rels(), but on the other hand we
get to remove some complexity from distribute_qual_to_rels() because
it no longer has to deal with any is_deduced cases. Anyway, the
end goal of this step is that we can save away all the generated
"x = const" clauses in the EC's ec_derives list. 0001 doesn't
do anything with that information, but ...

0002 actually fixes the bug. Dealing with the first part of the
problem just requires counting how many of the ECs we matched to
an FK constraint are ec_has_const. To deal with the second part,
we dig out the scan-level "x = const" clause that the EC generated
for the FK column and see what selectivity it has got. This beats
other ways of reconstructing the scan-clause selectivity because
(at least in all normal cases) that selectivity would have been
cached in the RestrictInfo. Thus we not only save cycles but can be
sure we are cancelling out exactly the right amount of selectivity.

I would not propose back-patching this, but it seems OK for HEAD.
Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com

Attachment Content-Type Size
0001-refactor-process_implied_equality.patch text/x-diff 17.2 KB
0002-fix-fk-estimates-with-constants.patch text/x-diff 9.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-10-27 03:47:43 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Greg Nancarrow 2020-10-27 03:45:47 Re: Parallel INSERT (INTO ... SELECT ...)