Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-05-26 02:19:23
Message-ID: CAKU4AWprgUw6OqSo=Jtv0Cv0GnqsaxxvW4ag3LpzOyVy6MfAEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 12, 2021 at 8:31 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> Thanks for these suggestions.
>
> On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:
> > > To summarise here, the planner performance gets a fair bit worse with
> > > the patched code. With master, summing the average planning time over
> > > each of the queries resulted in a total planning time of 765.7 ms.
> > > After patching, that went up to 1097.5 ms. I was pretty disappointed
> > > about that.
> >
> > I have a couple ideas;
>

I just checked the latest code, looks like we didn't improve this
situation except
that we introduced a GUC to control it. Am I missing something? I don't
have a
suggestion though.

> > - default enable_resultcache=off seems okay. In plenty of cases,
> planning
> > time is unimportant. This is the "low bar" - if we can do better and
> enable
> > it by default, that's great.
>
> I think that's reasonable. Teaching the planner to do new tricks is
> never going to make the planner produce plans more quickly. When the
> new planner trick gives us a more optimal plan, then great. When it
> does not then it's wasted effort. Giving users the ability to switch
> off the planner's new ability seems like a good way for people who
> continually find it the additional effort costs more than it saves
> seems like a good way to keep them happy.
>
> > - Maybe this should be integrated into nestloop rather than being a
> separate
> > plan node. That means that it could be dynamically enabled during
> > execution, maybe after a few loops or after checking that there's at
> least
> > some minimal number of repeated keys and cache hits. cost_nestloop
> would
> > consider whether to use a result cache or not, and explain would show
> the
> > cache stats as a part of nested loop. In this case, I propose
> there'd still
> > be a GUC to disable it.
>
> There was quite a bit of discussion on that topic already on this
> thread. I don't really want to revisit that.
>
> The main problem with that is that we'd be forced into costing a
> Nested loop with a result cache exactly the same as we do for a plain
> nested loop. If we were to lower the cost to account for the cache
> hits then the planner is more likely to choose a nested loop over a
> merge/hash join. If we then switched the caching off during execution
> due to low cache hits then that does not magically fix the bad choice
> of join method. The planner may have gone with a Hash Join if it had
> known the cache hit ratio would be that bad. We'd still be left to
> deal with the poor performing nested loop. What you'd really want
> instead of turning the cache off would be to have nested loop ditch
> the parameter scan and just morph itself into a Hash Join node. (I'm
> not proposing we do that)
>
> > - Maybe cost_resultcache() can be split into initial_cost and final_cost
> > parts, same as for nestloop ? I'm not sure how it'd work, since
> > initial_cost is supposed to return a lower bound, and resultcache
> tries to
> > make things cheaper. initial_cost would just add some operator/tuple
> costs
> > to make sure that resultcache of a unique scan is more expensive than
> > nestloop alone. estimate_num_groups is at least O(n) WRT
> > rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs)
> *
> > cpu_operator_cost in initial_cost and then call estimate_num_groups in
> > final_cost. We'd be estimating the cost of estimating the cost...
>
> The cost of the Result Cache is pretty dependant on the n_distinct
> estimate. Low numbers of distinct values tend to estimate a high
> number of cache hits, whereas large n_distinct values (relative to the
> number of outer rows) is not going to estimate a large number of cache
> hits.
>
> I don't think feeding in a fake value would help us here. We'd
> probably do better if we had a fast way to determine if a given Expr
> is unique. (e.g UniqueKeys patch). Result Cache is never going to be
> a win for a parameter that the value is never the same as some
> previously seen value. This would likely allow us to skip considering
> a Result Cache for the majority of OLTP type joins.
>
> > - Maybe an initial implementation of this would only add a result cache
> if the
> > best plan was already going to use a nested loop, even though a cached
> > nested loop might be cheaper than other plans. This would avoid most
> > planner costs, and give improved performance at execution time, but
> leaves
> > something "on the table" for the future.
> >
> > > +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
> > > + Cost *rescan_startup_cost, Cost
> *rescan_total_cost)
> > > +{
> > > + double tuples = rcpath->subpath->rows;
> > > + double calls = rcpath->calls;
> > ...
> > > + /* estimate on the distinct number of parameter values */
> > > + ndistinct = estimate_num_groups(root, rcpath->param_exprs,
> calls, NULL,
> > > + &estinfo);
> >
> > Shouldn't this pass "tuples" and not "calls" ?
>
> hmm. I don't think so. "calls" is the estimated number of outer side
> rows. Here you're asking if the n_distinct estimate is relevant to
> the inner side rows. It's not. If we expect to be called 1000 times by
> the outer side of the nested loop, then we need to know our n_distinct
> estimate for those 1000 rows. If the estimate comes back as 10
> distinct values and we see that we're likely to be able to fit all the
> tuples for those 10 distinct values in the cache, then the hit ratio
> is going to come out at 99%. 10 misses, for the first time each value
> is looked up and the remainder of the 990 calls will be hits. The
> number of tuples (and the width of tuples) on the inside of the nested
> loop is only relevant to calculating how many cache entries is likely
> to fit into hash_mem. When we think cache entries will be evicted
> then that makes the cache hit calculation more complex.
>
> I've tried to explain what's going on in cost_resultcache_rescan() the
> best I can with comments. I understand it's still pretty hard to
> follow what's going on. I'm open to making it easier to understand if
> you have suggestions.
>
> David
>

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-05-26 02:23:46 Re: storing an explicit nonce
Previous Message Andres Freund 2021-05-26 02:16:41 Re: storing an explicit nonce