Re: Removing unneeded self joins

From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Removing unneeded self joins
Date: 2022-03-01 00:09:49
Message-ID: Yh1kTUtj7+S4Wygy@ahch-to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of partitioned
> tables, for example, the overhead could grow. But in the context of overall
> planning time this overhead will be small till the large number of
> relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
>

I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.

This is what I found:

+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| test | mode | master | enabled | % | disabled | % |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read only | standard | 64418.13 | 63942.94 | -0.74 | 62231.38 | -3.39 |
| pgbench read only | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 | -6.92 |
| pgbench read only | extended | 55409.65 | 56427.63 | 1.84 | 55927.62 | 0.93 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read/write | standard | 9374.91 | 9135.21 | -2.56 | 8840.68 | -5.70 |
| pgbench read/write | prepared | 11849.86 | 11672.23 | -1.50 | 11393.39 | -3.85 |
| pgbench read/write | extended | 7976.80 | 7947.07 | -0.37 | 7788.99 | -2.35 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select non optimize 1 | standard | 80.97 | 81.29 | 0.40 | 81.30 | 0.41 |
| select non optimize 1 | prepared | 81.29 | 81.28 | -0.01 | 80.89 | -0.49 |
| select non optimize 1 | extended | 81.07 | 80.81 | -0.32 | 80.98 | -0.11 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 1 | standard | 15.84 | 13.90 |-12.25 | 15.80 | -0.25 |
| select optimized 1 | prepared | 15.24 | 13.82 | -9.32 | 15.55 | 2.03 |
| select optimized 1 | extended | 15.38 | 13.89 | -9.69 | 15.59 | 1.37 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 2 | standard | 10204.91 | 10818.39 | 6.01 | 10261.07 | 0.55 |
| select optimized 2 | prepared | 13284.06 | 15579.33 | 17.28 | 13116.22 | -1.26 |
| select optimized 2 | extended | 10143.43 | 10645.23 | 4.95 | 10142.77 | -0.01 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select shoe | standard | 5645.28 | 5661.71 | 0.29 | 6180.60 | 9.48 |
| select shoe | prepared | 9660.45 | 9602.37 | -0.60 | 9894.82 | 2.43 |
| select shoe | extended | 5666.47 | 5634.10 | -0.57 | 5757.26 | 1.60 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+

Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.

I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:

insert into users
select seq, case when random() < 0.2 then null else random() * 1000 end,
random() * 10000
from generate_series(1, 1000000) seq
on conflict (nullable_int) do nothing;

for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.

After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment Content-Type Size
select_nonoptimize1.sql application/sql 74 bytes
select_optimize1.sql application/sql 142 bytes
select_optimize2.sql application/sql 155 bytes
select_shoe.sql application/sql 105 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-01 00:12:06 Re: Allow root ownership of client certificate key
Previous Message David Steele 2022-03-01 00:07:28 Re: Allow root ownership of client certificate key