Re: Performance issue in foreign-key-aware join estimation

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Performance issue in foreign-key-aware join estimation
Date: 2018-12-24 12:07:15
Message-ID: CAKJS1f_BQvjetGKsjT65gLAVWXQyRYRJpuXE2eBKrE0o0EcWwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Mon, 24 Dec 2018 at 09:38, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> Using the above idea, but rather than going to the trouble of storing
> PlannerInfo->eq_classes as an array type list, if we build the array
> on the fly inside match_foreign_keys_to_quals(), then build a
> Bitmapset type index to mark which of the eclasses contains members
> for each relation, then I can get the run-time for the function down
> to just 0.89%. Looking at other functions appearing high on the
> profile I also see; have_relevant_eclass_joinclause() (14%),
> generate_join_implied_equalities_for_ecs() (23%).

I've now expanded the proof of concept patch to use this indexing
technique for have_relevant_eclass_joinclause() and
generate_join_implied_equalities_for_ecs(). With Tom's test from
up-thread, I get:

Master:
latency average = 14125.374 ms

Patched:
latency average = 2417.164 ms

There are some other cases, such as
generate_implied_equalities_for_column(), that are possibly also
indexable, but in that case, we cannot use ec_relids to help build the
index since it does not keep track of other member relation
equivalence class members. That function is appearing at about 3.3% of
total plan time with the patched version of the code, so there's still
some small gains to be had there. The performance of
has_relevant_eclass_joinclause() could likely also be improved from
this indexing. According to my profiling, it's currently still about
2.6% of total planning time with the patched version.

I've attached the updated (rough) proof of concept patch. I ended up
stuffing the equivalence class index structure into PlannerInfo so
that it would be available in all the places it was required, but
build just once higher up the call stack. I don't believe this is the
correct solution for a finished patch, but I didn't really have any
better ideas and this seemed good enough to demonstrate what the
performance could look like.

Other ideas I have to further improve the performance of this query
would be to move the fkey_list out of PlannerInfo and instead include
a per-relation list inside RelOptInfo. This would allow
get_foreign_key_join_selectivity() to just look at foreign keys that
are relevant to the given relations rather than having to skip all
foreign keys that are not. This function is still accounting for about
5.5% of the total planning time for this query. I imagine it wouldn't
hurt match_foreign_keys_to_quals() too much to have it loop over each
RelOptInfo and look at the foreign keys defined on each of those.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
poc_eclass_indexing_v2.patch application/octet-stream 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-12-24 16:28:01 Re: Removing unneeded self joins
Previous Message Michael Paquier 2018-12-24 11:31:03 Re: Change pgarch_readyXlog() to return .history files first