Re: pg9.6 segfault using simple query (related to use fk for join estimates)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Stefan Huehner <stefan(at)huehner(dot)org>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Date: 2016-06-04 18:15:27
Message-ID: 28109.1465064127@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> FYI, I spoke to Tom Lane about this at PGCon and suggested that he
> look at the proposed patch as I requested in
> https://www.postgresql.org/message-id/CA+TgmobPqrAVXOBMHTcpDq8hX7gCzcVhoUvC8s9V=d09+bt30w@mail.gmail.com
> and see whether that would address his concerns, and he said that he
> would do that. It may, however, have slipped his mind.

> My opinion is that something needs to be done about this patch. It
> needs to be improved or reverted. Improved would be ideal in my mind,
> but reverted is an outcome I can live with.

I've taken a look at fkest_fixes_v2.patch. I do not like
mark_useful_foreign_keys one bit: it adds *more* overhead to *every*
query, whether or not there's anything to be saved. Also, adding a
flag to the ForeignKeyOptInfo entries doesn't save having to iterate
over the irrelevant entries. We should arrange for them not to be
in the list in the first place.

The direction I would go in to get rid of irrelevant FKs is more
like this:

* Make RelationGetFKeyList cache a list of ForeignKeyOptInfo structs,
not just constraint OIDs. It's insane that the relcache scans
pg_constraint to collect those OIDs and then the planner re-reads all
those same rows on every planning cycle. Allow the relcache to return
a pointer to the list in-cache, and require the planner to copy that
data before making any more cache requests. The planner would run
through the list, skipping single-key entries and entries leading to
irrelevant tables, and copy out just the items that are useful for
the current query (probably adding query-specific table RTE indexes
at the same time).

* Personally I'd probably handle the "ignore irrelevant tables" test
with a list_member_oid test on a list of relation OIDs, not a hashtable.
It's unlikely that there are enough tables in the query to justify
building a hashtable.

* All of the above should happen only if the query contains multiple
tables; there is no reason to expend even one cycle on loading FK data
in a simple single-table query.

But more generally, this entire approach does very little to improve
the problem of nested loops leading to probable poor big-O behavior.
I previously complained that the patch adds code that is executed

> * at least once per join relation created (hence, significantly more than
> the number of rels in the query; see also get_joinrel_parampathinfo)
> * for each inner relation in the initial input joinrel pair
> * for each outer relation in the initial joinrel pair
> * for each foreign key constraint on this inner and outer rel
> * for each key column in that FK
> * for each join qual for the current input joinrel pair
> * for each member of the relevant EquivalenceClass

and only the fourth of those seven nested loops is significantly shortened
by this patch. It's still about as brute-force as can be. I think that
fkest_fixes_v2.patch really only makes things significantly better for
cases where a table in the query has a huge number of FKs leading to
tables not in the query, and TBH I think that's an artificial test
condition that will seldom have much to do with real-world performance.

It's also pretty bad that "for each foreign key constraint on this inner
and outer rel" means "for each FK leading out of either the inner or outer
rel, whether or not it leads to the other side". The existing code has
to do that in order to handle cases where the particular representative
clause generated from an EC doesn't match the FK but another clause could
have. I still think that some more careful thought about relating FKs
to ECs could result in elimination of a couple of these looping levels
altogether.

This is certainly all attackable, and I'm even willing to help work on it,
but it's going to lead to a rather larger patch than we usually like to
apply in beta. And there are still all the non-planning-speed cleanup
issues to be dealt with. fkest_fixes_v2.patch addresses some of them
but I still feel that a lot of work is needed on the comments.

Also, there are serious bugs remaining, even without considering planning
speed. An example I just noticed is that quals_match_foreign_key actually
fails entirely to ensure that it's found a match to the FK, because there
is no check on whether foreignrel has anything to do with the FK. That
is, this bit:

* We make no attempt in checking that this foreign key actually
* references 'foreignrel', the reasoning here is that we may be able
* to match the foreign key to an eclass member Var of a RestrictInfo
* that's in qualslist, this Var may belong to some other relation.

would be okay if you checked after identifying a matching eclass member
that it belonged to the FK's referenced table, but AFAICS there is no such
check anywhere.

We need to decide whether we want to put a significant amount of time
into this patch over the next couple of weeks, to the exclusion of other
things, or revert it until the next devel cycle opens. Since we're not
exactly just sitting around twiddling our thumbs, I'm hesitant to commit
the required amount of time to pursue the first course. But we've
commissioned the RMT to make these sorts of decisions, so it's in their
lap.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-04 19:44:06 New design for FK-based join selectivity estimation
Previous Message Tom Lane 2016-06-04 15:29:41 Re: [sqlsmith] Failed assertion in joinrels.c