Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-02-16 10:15:51
Message-ID: CAApHDvpbusiKMV=vZypdpHHu81u0zMVAp6hu1vg-=gQLBBKUPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 3 Feb 2021 at 19:51, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I've attached a spreadsheet with the results of each of the tests.
>
> The attached file v13_costing_hacks.patch.txt is the quick and dirty
> patch I put together to run test 5.

I've attached an updated set of patches. I'd forgotten to run make
check-world with the 0005 patch and that was making the CF bot
complain. I'm not intending 0005 for commit in the state that it's
in, so I've just dropped it.

I've also done some further performance testing with the attached set
of patched, this time I focused solely on planner performance using
the Join Order Benchmark. Some of the queries in this benchmark do
give the planner quite a bit of exercise. Queries such as 29b take my
1-year old, fairly powerful AMD hardware about 78 ms to make a plan
for.

The attached spreadsheet shows the details of the results of these
tests. Skip to the "Test6 no parallel 100 stats EXPLAIN only" sheet.

To get these results I just ran pgbench for 10 seconds on each query
prefixed with "EXPLAIN ".

To summarise here, the planner performance gets a fair bit worse with
the patched code. With master, summing the average planning time over
each of the queries resulted in a total planning time of 765.7 ms.
After patching, that went up to 1097.5 ms. I was pretty disappointed
about that.

On looking into why the performance gets worse, there's a few factors.
One factor is that I'm adding a new path to consider and if that path
sticks around then subsequent joins may consider that path. Changing
things around so I only ever add the best path, the time went down to
1067.4 ms. add_path() does tend to ditch inferior paths anyway, so
this may not really be a good thing to do. Another thing that I picked
up on was the code that checks if a Result Cache Path is legal to use,
it must check if the inner side of the join has any volatile
functions. If I just comment out those checks, then the total planning
time goes down to 985.6 ms. The estimate_num_groups() call that the
costing for the ResultCache path must do to estimate the cache hit
ratio is another factor. When replacing that call with a constant
value the total planning time goes down to 905.7 ms.

I can see perhaps ways that the volatile function checks could be
optimised a bit further, but the other stuff really is needed, so it
appears if we want this, then it seems like the planner is going to
become slightly slower. That does not exactly fill me with joy. We
currently have enable_partitionwise_aggregate and
enable_partitionwise_join which are both disabled by default because
of the possibility of slowing down the planner. One option could be
to make enable_resultcache off by default too. I'm not really liking
the idea of that much though since anyone who leaves the setting that
way won't ever get any gains from caching the inner side of
parameterised nested loop results.

The idea I had to speed up the volatile function call checks was along
similar lines to what parallel query does when it looks for parallel
unsafe functions in the parse. Right now those checks are only done
under a few conditions where we think that parallel query might
actually be used. (See standard_planner()). However, with Result
Cache, those could be used in many other cases too, so we don't really
have any means to short circuit those checks. There might be gains to
be had by checking the parse once rather than having to call
contains_volatile_functions in the various places we do call it. I
think both the parallel safety and volatile checks could then be done
in the same tree traverse. Anyway. I've not done any hacking on this.
It's just an idea so far.

Does anyone have any particular thoughts on the planner slowdown?

David

Attachment Content-Type Size
v14-0001-Allow-estimate_num_groups-to-pass-back-further-d.patch text/plain 9.1 KB
v14-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patch text/plain 3.5 KB
v14-0003-Add-Result-Cache-executor-node.patch text/plain 145.0 KB
v14-0004-Remove-code-duplication-in-nodeResultCache.c.patch text/plain 5.1 KB
Result_cache_v14_vs_master.ods application/vnd.oasis.opendocument.spreadsheet 48.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-16 10:22:11 Re: [PATCH] pg_hba.conf error messages for logical replication connections
Previous Message Mark Rofail 2021-02-16 10:07:10 Re: [HACKERS] GSoC 2017: Foreign Key Arrays