Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-25 17:37:08
Message-ID: CAKU4AWrSE+09=e-mUx113QJWCqF=KnqvZ+v8y9xmbjywfB6Txw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > While I'm against introducing a separate node for the caching, I'm
> *not*
> > > against displaying a different node type when caching is
> > > present. E.g. it'd be perfectly reasonable from my POV to have a
> 'Cached
> > > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > > probably still want to display the 'Cache Key' similar to your example,
> > > but I don't see how it'd be better to display it with one more
> > > intermediary node.
> >
> > ...Well, this is difficult... For the record, in case anyone missed
> > it, I'm pretty set on being against doing any node overloading for
> > this. I think it's a pretty horrid modularity violation regardless of
> > what text appears in EXPLAIN. I think if we merge these nodes then we
> > may as well go further and merge in other simple nodes like LIMIT.
>
> Huh? That doesn't make any sense. LIMIT is applicable to every single
> node type with the exception of hash. The caching you talk about is
> applicable only to node types that parametrize their sub-nodes, of which
> there are exactly two instances.
>
> Limit doesn't shuttle through huge amounts of tuples normally. What you
> talk about does.
>
>
>
> > Also, just in case anyone is misunderstanding this Andres' argument.
> > It's entirely based on the performance impact of having an additional
> > node.
>
> Not entirely, no. It's also just that it doesn't make sense to have two
> nodes setting parameters that then half magically picked up by a special
> subsidiary node type and used as a cache key. This is pseudo modularity,
> not real modularity. And makes it harder to display useful information
> in explain etc. And makes it harder to e.g. clear the cache in cases we
> know that there's no further use of the current cache. At least without
> piercing the abstraction veil.
>
>
Sorry that I missed this when I replied to the last thread. I understand
this, I remain neutral about this.

> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.
>
>
> > I understand that you've voiced your feelings about this, but what I
> > want to know is, how strongly do you feel about overloading the node?
> > Will you stand in my way if I want to push ahead with the separate
> > node? Will anyone else?
>
> I feel pretty darn strongly about this. If there's plenty people on your
> side I'll not stand in your way, but I think this is a bad design based on
> pretty flimsy reasons.
>
> Greetings,
>
> Andres Freund
>
>
>

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-08-25 17:42:29 Re: some unused parameters cleanup
Previous Message Andy Fan 2020-08-25 17:18:34 Re: Hybrid Hash/Nested Loop joins and caching results from subplans