Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-03-12 00:31:43
Message-ID: CAApHDvqAPM=mYq-7z_WVc4mErV+PJUOkytjNJ12WSsF5-1cftQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for these suggestions.

On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:
> > To summarise here, the planner performance gets a fair bit worse with
> > the patched code. With master, summing the average planning time over
> > each of the queries resulted in a total planning time of 765.7 ms.
> > After patching, that went up to 1097.5 ms. I was pretty disappointed
> > about that.
>
> I have a couple ideas;
>
> - default enable_resultcache=off seems okay. In plenty of cases, planning
> time is unimportant. This is the "low bar" - if we can do better and enable
> it by default, that's great.

I think that's reasonable. Teaching the planner to do new tricks is
never going to make the planner produce plans more quickly. When the
new planner trick gives us a more optimal plan, then great. When it
does not then it's wasted effort. Giving users the ability to switch
off the planner's new ability seems like a good way for people who
continually find it the additional effort costs more than it saves
seems like a good way to keep them happy.

> - Maybe this should be integrated into nestloop rather than being a separate
> plan node. That means that it could be dynamically enabled during
> execution, maybe after a few loops or after checking that there's at least
> some minimal number of repeated keys and cache hits. cost_nestloop would
> consider whether to use a result cache or not, and explain would show the
> cache stats as a part of nested loop. In this case, I propose there'd still
> be a GUC to disable it.

There was quite a bit of discussion on that topic already on this
thread. I don't really want to revisit that.

The main problem with that is that we'd be forced into costing a
Nested loop with a result cache exactly the same as we do for a plain
nested loop. If we were to lower the cost to account for the cache
hits then the planner is more likely to choose a nested loop over a
merge/hash join. If we then switched the caching off during execution
due to low cache hits then that does not magically fix the bad choice
of join method. The planner may have gone with a Hash Join if it had
known the cache hit ratio would be that bad. We'd still be left to
deal with the poor performing nested loop. What you'd really want
instead of turning the cache off would be to have nested loop ditch
the parameter scan and just morph itself into a Hash Join node. (I'm
not proposing we do that)

> - Maybe cost_resultcache() can be split into initial_cost and final_cost
> parts, same as for nestloop ? I'm not sure how it'd work, since
> initial_cost is supposed to return a lower bound, and resultcache tries to
> make things cheaper. initial_cost would just add some operator/tuple costs
> to make sure that resultcache of a unique scan is more expensive than
> nestloop alone. estimate_num_groups is at least O(n) WRT
> rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) *
> cpu_operator_cost in initial_cost and then call estimate_num_groups in
> final_cost. We'd be estimating the cost of estimating the cost...

The cost of the Result Cache is pretty dependant on the n_distinct
estimate. Low numbers of distinct values tend to estimate a high
number of cache hits, whereas large n_distinct values (relative to the
number of outer rows) is not going to estimate a large number of cache
hits.

I don't think feeding in a fake value would help us here. We'd
probably do better if we had a fast way to determine if a given Expr
is unique. (e.g UniqueKeys patch). Result Cache is never going to be
a win for a parameter that the value is never the same as some
previously seen value. This would likely allow us to skip considering
a Result Cache for the majority of OLTP type joins.

> - Maybe an initial implementation of this would only add a result cache if the
> best plan was already going to use a nested loop, even though a cached
> nested loop might be cheaper than other plans. This would avoid most
> planner costs, and give improved performance at execution time, but leaves
> something "on the table" for the future.
>
> > +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
> > + Cost *rescan_startup_cost, Cost *rescan_total_cost)
> > +{
> > + double tuples = rcpath->subpath->rows;
> > + double calls = rcpath->calls;
> ...
> > + /* estimate on the distinct number of parameter values */
> > + ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
> > + &estinfo);
>
> Shouldn't this pass "tuples" and not "calls" ?

hmm. I don't think so. "calls" is the estimated number of outer side
rows. Here you're asking if the n_distinct estimate is relevant to
the inner side rows. It's not. If we expect to be called 1000 times by
the outer side of the nested loop, then we need to know our n_distinct
estimate for those 1000 rows. If the estimate comes back as 10
distinct values and we see that we're likely to be able to fit all the
tuples for those 10 distinct values in the cache, then the hit ratio
is going to come out at 99%. 10 misses, for the first time each value
is looked up and the remainder of the 990 calls will be hits. The
number of tuples (and the width of tuples) on the inside of the nested
loop is only relevant to calculating how many cache entries is likely
to fit into hash_mem. When we think cache entries will be evicted
then that makes the cache hit calculation more complex.

I've tried to explain what's going on in cost_resultcache_rescan() the
best I can with comments. I understand it's still pretty hard to
follow what's going on. I'm open to making it easier to understand if
you have suggestions.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-03-12 00:38:49 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Tomas Vondra 2021-03-12 00:29:46 Re: automatic analyze: readahead - add "IO read time" log message