Re: Removing unneeded self joins

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2018-06-25 15:26:28
Message-ID: d44ef452-10b8-7680-2984-84897283fa97@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David,

I tried to implement your suggestions, here are the patches.

The first patch mostly moves some code around without changing
functionality. It modifies innerrel_is_unique to not only cache the fact
that the relation is unique, but also
cache the index that guarantees uniqueness.

The second patch adds the unique self join removal code. It goes along
the lines of your plan. I didn't even have to examine join clauses
because the constraints imposed by innerrel_is_unique are strong enough
to guarantee that when it finds the same unique index for both
relations, the join can be removed. Namely, it requires mergejoinable
equality clauses for all columns of a unique index.

As a simple benchmark, I measured the duration of query_planner and
remove_useless_self_joins with clock_gettime() on the regression tests.
The following table shows average times in microseconds, median over 11
runs. First row is with this patch, and the second row doesn't call
remove_useless_self_joins and just calls clock_gettime to measure its
overhead.

              query_planner  remove_useless_self_joins

with removal       39.61            0.61

no removal         39.45            0.38

So, on this workload, unique self join removal adds about 0.2 mcs, or
0.6% of total time, to query_planner. I also tried a query that joins 26
relations, remove_useless_self_joins takes about 40 mcs. Still, this
time grows quadratically with number of relations we have to process, so
in the final patch I limit it to join_collapse_limit, which brings the
time down to 15 mcs. This is negligible compared to the total
query_planner time, which for 8 relations is about 3 ms, that is, 3
orders of magnitude higher.

These benchmarks mostly measure the path where we don't remove any
joins. I didn't time the join removal itself, because it wins quite some
time by allowing to plan one relation and one join less.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0002-Remove-unique-self-joins-v2.patch text/x-patch 18.7 KB
0001-Preparatory-refactoring-v2.patch text/x-patch 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Finzel 2018-06-25 15:37:18 Some pgq table rewrite incompatibility with logical decoding?
Previous Message Andrew Dunstan 2018-06-25 14:41:41 Re: Using JSONB directly from application