Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-11 05:44:22
Message-ID: 20200811054422.svlohrubnx5xjwfx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-11 17:23:42 +1200, David Rowley wrote:
> On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2020-07-09 10:25:14 +1200, David Rowley wrote:
> > > On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > I'm not convinced it's a good idea to introduce a separate executor node
> > > > for this. There's a fair bit of overhead in them, and they will only be
> > > > below certain types of nodes afaict. It seems like it'd be better to
> > > > pull the required calls into the nodes that do parametrized scans of
> > > > subsidiary nodes. Have you considered that?
> > >
> > > I see 41 different node types mentioned in ExecReScan(). I don't
> > > really think it would be reasonable to change all those.
> >
> > But that's because we dispatch ExecReScan mechanically down to every
> > single executor node. That doesn't determine how many nodes would need
> > to modify to include explicit caching? What am I missing?
> >
> > Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
> > integration?
>
> hmm, I think you're right there about those two node types. I'm just
> not sure you're right about overloading these node types to act as a
> cache.

I'm not 100% either, to be clear. I am just acutely aware that adding
entire nodes is pretty expensive, and that there's, afaict, no need to
have arbitrary (i.e. pointer to function) type callbacks to point to the
cache.

> How would you inform users via EXPLAIN ANALYZE of how many
> cache hits/misses occurred?

Similar to how we display memory for sorting etc.

> What would you use to disable it for an
> escape hatch for when the planner makes a bad choice about caching?

Isn't that *easier* when embedding it into the node? There's no nice way
to remove an intermediary executor node entirely, but it's trivial to
have an if statement like
if (node->cache && upsert_cache(node->cache, param))

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-08-11 05:45:50 Re: Add information to rm_redo_error_callback()
Previous Message Michael Paquier 2020-08-11 05:39:45 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly