Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-24 20:26:40
Message-ID: 20200824202640.zihclnxkmho3f5cd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote:
> On 2020-Aug-19, David Rowley wrote:
>
> > Andres' suggestion:
> >
> > regression=# explain (analyze, costs off, timing off, summary off)
> > select count(*) from tenk1 t1 inner join tenk1 t2 on
> > t1.twenty=t2.unique1;
> > QUERY PLAN
> > ---------------------------------------------------------------------------------------
> > Aggregate (actual rows=1 loops=1)
> > -> Nested Loop (actual rows=10000 loops=1)
> > Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
> > -> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
> > -> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
> > Index Cond: (unique1 = t1.twenty)
> > (6 rows)
>
> I think it doesn't look terrible in the SubPlan case -- it kinda makes
> sense there -- but for nested loop it appears really strange.

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

> On the performance aspect, I wonder what the overhead is, particularly
> considering Tom's point of making these nodes more expensive for cases
> with no caching.

I doubt it, due to being a well predictable branch. But it's also easy
enough to just have a different Exec* function for the caching and
non-caching case, should that turn out to be a problem.

> And also, as the JIT saga continues, aren't we going to get plan trees
> recompiled too, at which point it won't matter much?

That's a fair bit out, I think. And even then it'll only help for
queries that run long enough (eventually also often enough, if we get
prepared statement JITing) to be worth JITing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-24 20:42:35 Re: Continuing instability in insert-conflict-specconflict test
Previous Message Pavel Stehule 2020-08-24 19:43:49 Re: Row estimates for empty tables