Re: Performance issue in foreign-key-aware join estimation

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance issue in foreign-key-aware join estimation
Date: 2018-12-25 20:50:35
Message-ID: 5bbb3aa9-6b1c-2915-37d3-983df0a822bd@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 12/25/18 3:48 AM, David Rowley wrote:
> On Tue, 25 Dec 2018 at 13:46, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I however observe failures on 4 regression test suites - inherit,
>> equivclass, partition_join and partition_prune (diff attached). That's a
>> bit surprising, because AFAICS the patch merely optimizes the execution
>> and should not change the planning otherwise (all the failures seem to
>> be a query plan changing in some way). I'm not sure if the plans are
>> correct or better than the old ones.
>
> Seems I didn't run the tests after doing a last-minute move of the
> create_eclass_index() call in make_one_rel(). I'd previously had it
> just below set_base_rel_pathlists(), which meant that the index didn't
> exist in some cases so it would fall back on the original code. It
> appears the use case call from set_base_rel_pathlists() require
> is_child eclass members to be indexed too, but these were not in v1 or
> v2 since ec_relids does not record child rel members.
>
> It seems simple enough to fix this just by adding ec_allrelids and
> setting it for all members rather than just non-children members.
> This also allows me to add the two additional cases to allow
> generate_implied_equalities_for_column() and
> has_relevant_eclass_joinclause() to also make use of the eclass index.
> This further reduces the total planning time for the query on my
> machine to 2304 ms.
>

OK, that makes sense.

>> The other thing is that we probably should not use a single test case to
>> measure the optimization - I doubt it can improve less extreme queries,
>> but maybe we should verify it does not regress them?
>
> Agreed. That still needs to be verified. Although I'm yet unsure what
> sort of form we could use this idea in. I wonder if it's fine to
> create this eclass index on the fly, or if it's something we should
> keep maintained all the time. The problem I saw with keeping it all
> the time is down to eq_classes being a List and list_nth() is slow,
> which means the bms_next_member() loops I've added would perform
> poorly when compiled with a linked list lookup.
>

Yeah, good questions. I think the simplest thing we could do is building
them on the first access - that would at least ensure we don't build the
index without accessing it at least once.

Of course, it might still be faster to do the check directly, for small
numbers of eclasses / fkeys and rels. Perhaps there's some sort of
heuristics deciding when to build the indexes, but that seems like an
overkill at this point.

What I propose is constructing a "minimal" simple query invoking this
code (something like two rels with a join on a foreign key) and measure
impact on that. That seems like a fairly realistic use case.

ALso, I wonder what is te goal here - how much do we need to redude the
duration to be happy? Initially I'd say "on par with the state before
the FK patch" but I guess we're already optimizing beyond that point.
What was the timing before adding the FK stuff?

>> With the patch attached, bms_overlap gets quite high in the profiles. I
>> think one reason for that is that all bitmap operations (not just
>> bms_overlap) always start the loop at 0. For the upper boundary is
>> usually determined as Min() of the lengths, but we don't do that for
>> lower boundary because we don't track that. The bitmaps for eclasses are
>> likely sparse, so this is quite painful. Attaches is a simple patch that
>> adds tracking of "minword" and uses it in various bms_ methods to skip
>> initial part of the loop. On my machine this reduces the timings by
>> roughly 5% (from 1610 to 1530 ms).
>
> Seems like an interesting idea, although for the most part, Bitmapsets
> are small and this adds some overhead to all use cases. I doubt it is
> worth the trouble for bms_is_member(), since that does not need to
> perform a loop over each bitmapword.

Perhaps. The bitmapsets are generally small in number of members, but
the values may be actually quite high in some cases (I don't have any
exact statistics, though).

You're right it adds a bit of overhead, but I'd expect that to be
outweighted by the reduction of cache misses. But 5% is fairly close to
noise.

> It looks like most of the bms_overlap() usages are caused by
> functions like have_join_order_restriction(), join_is_legal(),
> check_outerjoin_delay() and add_paths_to_joinrel(), all of which are
> performing a loop over the join_info_list. Perhaps that could be
> indexed a similar way to the eq_classes List. I'd imagine there's
> about another 20-30% of performance to squeeze out of this by doing
> that, plus a bit more by making the fkey_list per RelOptInfo.
>

Looks promising.

> Attached is v3 of the hacked up proof of concept performance demo patch.
>

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-12-25 20:54:23 Re: GIN predicate locking slows down valgrind isolationtests tremendously
Previous Message Mitar 2018-12-25 18:26:56 Re: Feature: triggers on materialized views