Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-12-10 16:44:03
Message-ID: 4bbece40-d648-b6f9-a323-fcb8f4b45e78@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.12.2020 23:53, David Rowley wrote:
> On Tue, 8 Dec 2020 at 20:15, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> I've attached another patchset that addresses some comments left by
>> Zhihong Yu over on [1]. The version number got bumped to v12 instead
>> of v11 as I still have a copy of the other version of the patch which
>> I made some changes to and internally named v11.
> If anyone else wants to have a look at these, please do so soon. I'm
> planning on starting to take a serious look at getting 0001-0003 in
> early next week.
>
> David
>
I tested the patched version of Postgres on JOBS benchmark:

https://github.com/gregrahn/join-order-benchmark

For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:

explain analyze SELECT MIN(chn.name) AS character,
       MIN(t.title) AS movie_with_american_producer
FROM char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     company_type AS ct,
     movie_companies AS mc,
     role_type AS rt,
     title AS t
WHERE ci.note LIKE '%(producer)%'
  AND cn.country_code = '[us]'
  AND t.production_year > 1990
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND ci.movie_id = mc.movie_id
  AND chn.id = ci.person_role_id
  AND rt.id = ci.role_id
  AND cn.id = mc.company_id
  AND ct.id = mc.company_type_id;
explain analyze SELECT MIN(cn.name) AS from_company,
       MIN(lt.link) AS movie_link_type,
       MIN(t.title) AS non_polish_sequel_movie
FROM company_name AS cn,
     company_type AS ct,
     keyword AS k,
     link_type AS lt,
     movie_companies AS mc,
     movie_keyword AS mk,
     movie_link AS ml,
     title AS t
WHERE cn.country_code !='[pl]'
  AND (cn.name LIKE '%Film%'
       OR cn.name LIKE '%Warner%')
  AND ct.kind ='production companies'
  AND k.keyword ='sequel'
  AND lt.link LIKE '%follow%'
  AND mc.note IS NULL
  AND t.production_year BETWEEN 1950 AND 2000
  AND lt.id = ml.link_type_id
  AND ml.movie_id = t.id
  AND t.id = mk.movie_id
  AND mk.keyword_id = k.id
  AND t.id = mc.movie_id
  AND mc.company_type_id = ct.id
  AND mc.company_id = cn.id
  AND ml.movie_id = mk.movie_id
  AND ml.movie_id = mc.movie_id
  AND mk.movie_id = mc.movie_id;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
 Finalize Aggregate  (cost=300131.43..300131.44 rows=1 width=64)
(actual time=522985.919..522993.614 rows=1 loops=1)
   ->  Gather  (cost=300131.00..300131.41 rows=4 width=64) (actual
time=522985.908..522993.606 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=299131.00..299131.01 rows=1
width=64) (actual time=522726.599..522726.606 rows=1 loops=5)
               ->  Hash Join  (cost=38559.78..298508.36 rows=124527
width=33) (actual time=301521.477..522726.592 rows=2 loops=5)
                     Hash Cond: (ci.role_id = rt.id)
                     ->  Hash Join  (cost=38558.51..298064.76
rows=124527 width=37) (actual time=301521.418..522726.529 rows=2 loops=5)
                           Hash Cond: (mc.company_type_id = ct.id)
                           ->  Nested Loop (cost=38557.42..297390.45
rows=124527 width=41) (actual time=301521.392..522726.498 rows=2 loops=5)
                                 ->  Nested Loop
(cost=38556.98..287632.46 rows=255650 width=29) (actual
time=235.183..4596.950 rows=156421 loops=5)
                                       Join Filter: (t.id = ci.movie_id)
                                       ->  Parallel Hash Join
(cost=38556.53..84611.99 rows=162109 width=29) (actual
time=234.991..718.934 rows=119250 loops
=5)
                                             Hash Cond: (t.id =
mc.movie_id)
                                             ->  Parallel Seq Scan on
title t  (cost=0.00..43899.19 rows=435558 width=21) (actual
time=0.010..178.332 rows=34
9806 loops=5)
                                                   Filter:
(production_year > 1990)
                                                   Rows Removed by
Filter: 155856
                                             ->  Parallel Hash
(cost=34762.05..34762.05 rows=303558 width=8) (actual
time=234.282..234.285 rows=230760 loops
=5)
                                                   Buckets: 2097152
(originally 1048576)  Batches: 1 (originally 1)  Memory Usage: 69792kB
                                                   ->  Parallel Hash
Join  (cost=5346.12..34762.05 rows=303558 width=8) (actual
time=11.846..160.085 rows=230
760 loops=5)
                                                         Hash Cond:
(mc.company_id = cn.id)
                                                         -> Parallel
Seq Scan on movie_companies mc  (cost=0.00..27206.55 rows=841655
width=12) (actual time
=0.013..40.426 rows=521826 loops=5)
                                                         -> Parallel
Hash  (cost=4722.92..4722.92 rows=49856 width=4) (actual
time=11.658..11.659 rows=16969
 loops=5)
Buckets: 131072  Batches: 1  Memory Usage: 4448kB
->  Parallel Seq Scan on company_name cn  (cost=0.00..4722.92 rows=49856
width=4) (actual time
=0.014..8.324 rows=16969 loops=5)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 30031
                                       ->  Result Cache
(cost=0.45..1.65 rows=2 width=12) (actual time=0.019..0.030 rows=1
loops=596250)
                                             Cache Key: mc.movie_id
                                             Hits: 55970  Misses:
62602  Evictions: 0  Overflows: 0  Memory Usage: 6824kB
                                             Worker 0:  Hits: 56042
Misses: 63657  Evictions: 0  Overflows: 0  Memory Usage: 6924kB
                                             Worker 1:  Hits: 56067
Misses: 63659  Evictions: 0  Overflows: 0  Memory Usage: 6906kB
                                             Worker 2:  Hits: 55947
Misses: 62171  Evictions: 0  Overflows: 0  Memory Usage: 6767kB
                                             Worker 3:  Hits: 56150
Misses: 63985  Evictions: 0  Overflows: 0  Memory Usage: 6945kB
                                             ->  Index Scan using
cast_info_movie_id_idx on cast_info ci  (cost=0.44..1.64 rows=2
width=12) (actual time=0.03
3..0.053 rows=1 loops=316074)
                                                   Index Cond:
(movie_id = mc.movie_id)
                                                   Filter:
((note)::text ~~ '%(producer)%'::text)
                                                   Rows Removed by
Filter: 25
                                 ->  Result Cache (cost=0.44..0.59
rows=1 width=20) (actual time=3.311..3.311 rows=0 loops=782104)
                                       Cache Key: ci.person_role_id
                                       Hits: 5  Misses: 156294
Evictions: 0  Overflows: 0  Memory Usage: 9769kB
                                       Worker 0:  Hits: 0  Misses:
156768  Evictions: 0  Overflows: 0  Memory Usage: 9799kB
                                       Worker 1:  Hits: 1  Misses:
156444  Evictions: 0  Overflows: 0  Memory Usage: 9778kB
                                       Worker 2:  Hits: 0  Misses:
156222  Evictions: 0  Overflows: 0  Memory Usage: 9764kB
                                       Worker 3:  Hits: 0  Misses:
156370  Evictions: 0  Overflows: 0  Memory Usage: 9774kB
                                       ->  Index Scan using
char_name_pkey on char_name chn  (cost=0.43..0.58 rows=1 width=20)
(actual time=0.001..0.001 rows
=0 loops=782098)
                                             Index Cond: (id =
ci.person_role_id)
                           ->  Hash  (cost=1.04..1.04 rows=4 width=4)
(actual time=0.014..0.014 rows=4 loops=5)
                                 Buckets: 1024  Batches: 1  Memory
Usage: 9kB
                                 ->  Seq Scan on company_type ct
(cost=0.00..1.04 rows=4 width=4) (actual time=0.012..0.012 rows=4 loops=5)
                     ->  Hash  (cost=1.12..1.12 rows=12 width=4)
(actual time=0.027..0.028 rows=12 loops=5)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on role_type rt
(cost=0.00..1.12 rows=12 width=4) (actual time=0.022..0.023 rows=12 loops=5)
 Planning Time: 2.398 ms
 Execution Time: 523002.608 ms
(55 rows)

I attach file with times of query execution.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
join.sql application/sql 110.2 KB
results.csv text/csv 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-12-10 17:16:02 Re: Change default of checkpoint_completion_target
Previous Message David Fetter 2020-12-10 16:15:12 Re: [HACKERS] [PATCH] Generic type subscripting