From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-08-24 20:26:40 |
Message-ID: | 20200824202640.zihclnxkmho3f5cd@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote:
> On 2020-Aug-19, David Rowley wrote:
>
> > Andres' suggestion:
> >
> > regression=# explain (analyze, costs off, timing off, summary off)
> > select count(*) from tenk1 t1 inner join tenk1 t2 on
> > t1.twenty=t2.unique1;
> > QUERY PLAN
> > ---------------------------------------------------------------------------------------
> > Aggregate (actual rows=1 loops=1)
> > -> Nested Loop (actual rows=10000 loops=1)
> > Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
> > -> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
> > -> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
> > Index Cond: (unique1 = t1.twenty)
> > (6 rows)
>
> I think it doesn't look terrible in the SubPlan case -- it kinda makes
> sense there -- but for nested loop it appears really strange.
While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.
> On the performance aspect, I wonder what the overhead is, particularly
> considering Tom's point of making these nodes more expensive for cases
> with no caching.
I doubt it, due to being a well predictable branch. But it's also easy
enough to just have a different Exec* function for the caching and
non-caching case, should that turn out to be a problem.
> And also, as the JIT saga continues, aren't we going to get plan trees
> recompiled too, at which point it won't matter much?
That's a fair bit out, I think. And even then it'll only help for
queries that run long enough (eventually also often enough, if we get
prepared statement JITing) to be worth JITing.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-08-24 20:42:35 | Re: Continuing instability in insert-conflict-specconflict test |
Previous Message | Pavel Stehule | 2020-08-24 19:43:49 | Re: Row estimates for empty tables |