Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-02-03 06:51:53
Message-ID: CAApHDvodjxCbOrinbk6rh9M07gMVySQdJBRTxR4PKbxvPuzj3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> I tested the patched version of Postgres on JOBS benchmark:
>
> https://github.com/gregrahn/join-order-benchmark
>
> For most queries performance is the same, some queries are executed
> faster but
> one query is 150 times slower:

I set up my AMD 3990x machine here to run the join order benchmark. I
used a shared_buffers of 20GB so that all the data would fit in there.
work_mem was set to 256MB.

I used imdbpy2sql.py to parse the imdb database files and load the
data into PostgreSQL. This seemed to work okay apart from the
movie_info_idx table appeared to be missing. Many of the 113 join
order benchmark queries need this table. Without that table, only 71
of the queries can run. I've not yet investigated why the table was
not properly created and loaded.

I performed 5 different sets of tests using master at 9522085a, and
master with the attached series of patches applied.

Tests:
* Test 1 uses the standard setting of 100 for
default_statistics_target and has parallel query disabled.
* Test 2 again uses 100 for the default_statistics_target but enables
parallel query.
* Test 3 increases default_statistics_target to 10000 (then ANALYZE)
and disables parallel query.
* Test 4 as test 3 but with parallel query enabled.
* Test 5 changes the cost model for Result Cache so that instead of
using a result cache based on the estimated number of cache hits, the
costing is simplified to inject a Result Cache node to a parameterised
nested loop if the n_distinct estimate of the nested loop parameters
is less than half the row estimate of the outer plan.

I ran each query using pgbench for 20 seconds.

Test 1:

18 of the 71 queries used a Result Cache node. Overall the runtime of
those queries was reduced by 12.5% using v13 when compared to master.

Over each of the 71 queries, the total time to parse/plan/execute each
of the queries was reduced by 7.95%.

Test 2:

Again 18 queries used a Result Cache. The speedup was about 2.2% for
just those 18 and 2.1% over the 71 queries.

Test 3:

9 queries used a Result Cache. The speedup was 3.88% for those 9
queries and 0.79% over the 71 queries.

Test 4:

8 of the 71 queries used a Result Cache. The speedup was 4.61% over
those 8 queries and 4.53% over the 71 queries.

Test 5:

Saw 15 queries using a Result Cache node. These 15 ran 5.95% faster
than master and over all of the 71 queries, the benchmark was 0.32%
faster.

I see some of the queries do take quite a bit of effort for the query
planner due to the large number of joins. Some of the faster to
execute queries here took a little longer due to this.

The reason I increased the statistics targets to 10k was down to the
fact that I noticed that in test 2 that queries 15c and 15d became
slower. After checking the n_distinct estimate for the Result Cache
key column I found that the estimate was significantly out when
compared to the actual n_distinct. Manually correcting the n_distinct
caused the planner to move away from using a Result Cache for those
queries. However, I thought I'd check if increasing the statistics
targets allowed a better n_distinct estimate due to the larger number
of blocks being sampled. It did.

I've attached a spreadsheet with the results of each of the tests.

The attached file v13_costing_hacks.patch.txt is the quick and dirty
patch I put together to run test 5.

David

Attachment Content-Type Size
v13-0001-Allow-estimate_num_groups-to-pass-back-further-d.patch text/plain 9.1 KB
v13-0004-Remove-code-duplication-in-nodeResultCache.c.patch text/plain 5.1 KB
v13-0005-Use-a-Result-Cache-node-to-cache-results-from-su.patch text/plain 27.2 KB
v13-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patch text/plain 3.5 KB
v13-0003-Add-Result-Cache-executor-node.patch text/plain 144.9 KB
Result_cache_v13_vs_master.ods application/vnd.oasis.opendocument.spreadsheet 38.5 KB
v13_costing_hacks.patch.txt text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-02-03 06:53:42 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message vignesh C 2021-02-03 06:41:15 Re: Printing backtrace of postgres processes