Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-25 15:52:40
Message-ID: 20200825155240.pjuej4i2c6hxrkbl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > While I'm against introducing a separate node for the caching, I'm *not*
> > against displaying a different node type when caching is
> > present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
> > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > probably still want to display the 'Cache Key' similar to your example,
> > but I don't see how it'd be better to display it with one more
> > intermediary node.
>
> ...Well, this is difficult... For the record, in case anyone missed
> it, I'm pretty set on being against doing any node overloading for
> this. I think it's a pretty horrid modularity violation regardless of
> what text appears in EXPLAIN. I think if we merge these nodes then we
> may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.

> Also, just in case anyone is misunderstanding this Andres' argument.
> It's entirely based on the performance impact of having an additional
> node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.

> However, given the correct planner choice, there will never be
> a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

> I understand that you've voiced your feelings about this, but what I
> want to know is, how strongly do you feel about overloading the node?
> Will you stand in my way if I want to push ahead with the separate
> node? Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-25 16:04:41 Re: Continuing instability in insert-conflict-specconflict test
Previous Message Ranier Vilela 2020-08-25 15:42:17 [PATCH] Dereference null return value (NULL_RETURNS) (src/backend/commands/statscmds.c)