Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-19 02:45:55
Message-ID: CAApHDvoXeVizPh_J0Ystw14Z8LpyTz_7bZxisqGS6ucHtAruZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 25 May 2020 at 19:53, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I didn't quite get the LATERAL support quite done in the version I
> sent. For now, I'm not considering adding a Result Cache node if there
> are lateral vars in any location other than the inner side of the
> nested loop join. I think it'll just be a few lines to make it work
> though. I wanted to get some feedback before going to too much more
> trouble to make all cases work.

I've now changed the patch so that it supports adding a Result Cache
node to LATERAL joins.

e.g.

regression=# explain analyze select count(*) from tenk1 t1, lateral
(select x from generate_Series(1,t1.twenty) x) gs;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=150777.53..150777.54 rows=1 width=8) (actual
time=22.191..22.191 rows=1 loops=1)
-> Nested Loop (cost=0.01..125777.53 rows=10000000 width=0)
(actual time=0.010..16.980 rows=95000 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000
width=4) (actual time=0.003..0.866 rows=10000 loops=1)
-> Result Cache (cost=0.01..10.01 rows=1000 width=0)
(actual time=0.000..0.001 rows=10 loops=10000)
Cache Key: t1.twenty
Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
-> Function Scan on generate_series x
(cost=0.00..10.00 rows=1000 width=0) (actual time=0.001..0.002 rows=10
loops=20)
Planning Time: 0.046 ms
Execution Time: 22.208 ms
(9 rows)

Time: 22.704 ms
regression=# set enable_resultcache=0;
SET
Time: 0.367 ms
regression=# explain analyze select count(*) from tenk1 t1, lateral
(select x from generate_Series(1,t1.twenty) x) gs;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=225445.00..225445.01 rows=1 width=8) (actual
time=35.578..35.579 rows=1 loops=1)
-> Nested Loop (cost=0.00..200445.00 rows=10000000 width=0)
(actual time=0.008..30.196 rows=95000 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000
width=4) (actual time=0.002..0.905 rows=10000 loops=1)
-> Function Scan on generate_series x (cost=0.00..10.00
rows=1000 width=0) (actual time=0.001..0.002 rows=10 loops=10000)
Planning Time: 0.031 ms
Execution Time: 35.590 ms
(6 rows)

Time: 36.027 ms

v7 patch series attached.

I also modified the 0002 patch so instead of modifying simplehash.h's
SH_DELETE function to have it call SH_LOOKUP and the newly added
SH_DELETE_ITEM function, I've just added an entirely new
SH_DELETE_ITEM and left SH_DELETE untouched. Trying to remove the
code duplication without having a negative effect on performance was
tricky and it didn't save enough code to seem worthwhile enough.

I also did a round of polishing work, fixed a spelling mistake in a
comment and reworded a few other comments to make some meaning more
clear.

David

Attachment Content-Type Size
v7-0001-Allow-estimate_num_groups-to-pass-back-further-de.patch application/octet-stream 8.8 KB
v7-0002-Allow-users-of-simplehash.h-to-perform-direct-del.patch application/octet-stream 3.5 KB
v7-0003-Add-Result-Cache-executor-node.patch application/octet-stream 168.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-19 03:27:06 Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)
Previous Message Andres Freund 2020-08-19 02:34:00 Re: prepared transaction isolation tests