Re: Performance issue in foreign-key-aware join estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Performance issue in foreign-key-aware join estimation
Date: 2019-03-08 19:05:50
Message-ID: 22399.1552071950@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On Thu, 21 Feb 2019 at 15:00, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Anyway, I rebased the POC patch up to HEAD, just in case anyone
>> still wants to play with it.

> Cool. Thanks.

I haven't done any of the performance testing that this patch
clearly needs, but just in a quick look through the code:

* I seriously dislike the usage of root->eq_classes, primarily the
underdocumented way that it means one thing in some phases of the
planner and something else in other phases. You seem to be doing that
in hopes of saving some memory, but is the index data structure really
large enough to matter? This scheme is certainly unlikely to continue
to work if we add any additional uses of EquivalenceClassIndexes.
So I think it might be better to pass them around as explicit
arguments and/or attach them someplace else than PlannerInfo.

* I'm also not very excited about having both a fast and slow path
in places like has_relevant_eclass_joinclause() depending on whether
the index exists or not. IMO, if we're going to do this at all,
we should just go over to requiring the index to exist when needed,
and get rid of the slow paths. (A possible variant to that is "build
the index structure on-demand", though you'd have to be careful about
GEQO memory management.) Otherwise we'll forever be fighting hidden
planner-performance bugs of the form "if you call function xyz from
here, it'll be way slower than you expected".

* There's not much point in making EquivalenceClassIndex a Node type
if you're not going to wire it up to any of the Node infrastructure
(particularly outfuncs.c, which might be useful for debug purposes).
But I'm not really sure that it ought to be a distinct data structure
at all --- maybe this requirement should be more tightly integrated
with the ECs themselves? One idea of what that might look like is to
let each base RelOptInfo have a list of associated EquivalenceClasses,
so that you'd only have to search through directly-relevant ECs when
trying to prove something. But I'm just handwaving here.

* Spell check please, particularly EQUIVALANCE.

* Documentation of the data structure is pretty weak, eg what is
"this relation" in the comment about ei_index? And how do you
know how long the arrays are, or what they're indexed by? And
there's no explicit statement that ei_flags is a bitmask of those
other symbols, much less any real statement of what each flag means.

Setting the CF entry to WOA for now. I wonder though if we should
just push it out to v13 immediately --- are you intending to do more
with it in the near future?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-03-08 19:14:46 Re: Reaping Temp tables to avoid XID wraparound
Previous Message Peter Geoghegan 2019-03-08 18:48:25 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons