Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-11-08 14:52:43
Message-ID: CAKU4AWpoQ_c9Rj2sTVPMGxAb+CAAe4z5ejcrL-89QirCxk=k+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >
> > So far benchmarking shows there's still a regression from the v8
> > version of the patch. This is using count(*). An earlier test [1] did
> > show speedups when we needed to deform tuples returned by the nested
> > loop node. I've not yet repeated that test again. I was disappointed
> > to see v9 slower than v8 after having spent about 3 days rewriting the
> > patch
>
> I did some further tests this time with some tuple deforming. Again,
> it does seem that v9 is slower than v8.
>

I run your test case on v8 and v9, I can produce a stable difference
between them.

v8:
statement latencies in milliseconds:
1603.611 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

v9:
statement latencies in milliseconds:
1772.287 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

then I did a perf on the 2 version, Is it possible that you
called tts_minimal_clear twice in
the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple
called tts_minimal_clear
on the same slot.

With the following changes:

diff --git a/src/backend/executor/execMRUTupleCache.c
b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache,
MRUCacheKey *key)
TupleTableSlot *tslot = mrucache->tableslot;
int numKeys = mrucache->nkeys;

- ExecClearTuple(pslot);
-
if (key == NULL)
{
+ ExecClearTuple(pslot);
/* Set the probeslot's values based on the current
parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] =
ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state =
MRUCACHE_FETCH_NEXT_TUPLE;

-
ExecClearTuple(mrucache->cachefoundslot);
+ //
ExecClearTuple(mrucache->cachefoundslot);
slot =
mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}

- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;

v9 has the following result:
1608.048 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

> Graphs attached
>
> Looking at profiles, I don't really see any obvious reason as to why
> this is. I'm very much inclined to just pursue the v8 patch (separate
> Result Cache node) and just drop the v9 idea altogether.
>
> David
>

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2020-11-08 14:59:19 Re: pgbench stopped supporting large number of client connections on Windows
Previous Message Andy Fan 2020-11-08 13:26:51 Re: Hybrid Hash/Nested Loop joins and caching results from subplans