Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-26 08:03:36
Message-ID: CAKU4AWo9Mod6ou-PY63Z4wVwtD98fqRF1WB+7-XGb8a7opAUXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 26, 2020 at 8:14 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Wed, 26 Aug 2020 at 05:18, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> >
> > On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >>
> >> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> >> > Also, just in case anyone is misunderstanding this Andres' argument.
> >> > It's entirely based on the performance impact of having an additional
> >> > node.
> >>
> >> Not entirely, no. It's also just that it doesn't make sense to have two
> >> nodes setting parameters that then half magically picked up by a special
> >> subsidiary node type and used as a cache key. This is pseudo modularity,
> >> not real modularity. And makes it harder to display useful information
> >> in explain etc. And makes it harder to e.g. clear the cache in cases we
> >> know that there's no further use of the current cache. At least without
> >> piercing the abstraction veil.
> >>
> >>
> >> > However, given the correct planner choice, there will never be
> >> > a gross slowdown due to having the extra node.
> >>
> >> There'll be a significant reduction in increase in performance.
> >
> >
> > If this is a key blocking factor for this topic, I'd like to do a simple
> hack
> > to put the cache function into the subplan node, then do some tests to
> > show the real difference. But it is better to decide how much difference
> > can be thought of as a big difference. And for education purposes,
> > I'd like to understand where these differences come from. For my
> > current knowledge, my basic idea is it saves some function calls?
>
> If testing this, the cache hit ratio will be pretty key to the
> results. You'd notice the overhead much less with a larger cache hit
> ratio since you're not pulling the tuple from as deeply a nested node.
> I'm unsure how you'd determine what is a good cache hit ratio to
> test it with.

I wanted to test the worst case where the cache hit ratio is 0. and then
compare the difference between putting the cache as a dedicated
node and in a SubPlan node. However, we have a better way
to test the difference based on your below message.

>

The lower the cache expected cache hit ratio, the higher
> the cost of the Result Cache node will be, so the planner has less
> chance of choosing to use it.
>

IIRC, we add the ResultCache for subplan nodes unconditionally now.
The main reason is we lack of ndistinct estimation during the subquery
planning. Tom suggested converting the AlternativeSubPlan to SubPlan
in setrefs.c [1], and I also ran into a case that can be resolved if we do
such conversion even earlier[2], the basic idea is we can do such
conversation
once we can get the actual values for the subplan.

something like
if (bms_is_subset(subplan->deps_relids, rel->relids)
{
convert_alternativesubplans_to_subplan(rel);
}
you can see if that can be helpful for ResultCache in this user case. my
patch in [2] is still in a very PoC stage so it only takes care of subplan
in
rel->reltarget.

> Say you find a case with the hit ratio of 90%. Going by [1] I found
> pulling a tuple through an additional node to cost about 12
> nanoseconds on an intel 4712HQ CPU. With a hit ratio of 90% we'll
> only pull 10% of tuples through the additional node, so that's about
> 1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
> might become hard to measure above the noise. More costly inner scans
> will have the planner choose to Result Cache with lower estimated hit
> ratios, but in that case, pulling the tuple through the additional
> node during a cache miss will be less noticeable due to the more
> costly inner side of the join.
>
> Likely you could test the overhead only in theory without going to the
> trouble of adapting the code to make SubPlan and Nested Loop do the
> caching internally. If you just modify ExecResultCache() to have it
> simply return its subnode, then measure the performance with and
> without enable_resultcache, you should get an idea of the per-tuple
> overhead of pulling the tuple through the additional node on your CPU.
>

Thanks for the hints. I think we can test it even easier with Limit node.

create table test_pull_tuples(a int);
insert into test_pull_tuples select i from generate_seri
insert into test_pull_tuples select i from generate_series(1, 100000)i;
-- test with pgbench.
select * from test_pull_tuples; 18.850 ms
select * from test_pull_tuples limit 100000; 20.500 ms

Basically it is 16 nanoseconds per tuple on my Intel(R) Xeon(R) CPU
E5-2650.
Personally I'd say the performance difference is negligible unless I see
some
different numbers.

[1]
https://www.postgresql.org/message-id/1992952.1592785225%40sss.pgh.pa.us
[2]
https://www.postgresql.org/message-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA%40mail.gmail.com

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-26 08:11:10 Re: some unused parameters cleanup
Previous Message Michael Paquier 2020-08-26 07:56:59 Re: Move OpenSSL random under USE_OPENSSL_RANDOM