Quick Links

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date:	2020-05-20 12:56:34
Message-ID:	CANP8+jLkh_g7Eu-=WoNP7P-Mi2Ye=Kt-5kceho80Xnd8xWrvfw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 20 May 2020 at 12:44, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> Hackers,
>
> Over on [1], Heikki mentioned about the usefulness of caching results
> from parameterized subplans so that they could be used again for
> subsequent scans which have the same parameters as a previous scan.
> On [2], I mentioned that parameterized nested loop joins could see
> similar gains with such a cache. I suggested there that instead of
> adding code that only allows this to work for subplans, that instead,
> we add a new node type that can handle the caching for us. We can
> then just inject that node type in places where it seems beneficial.
>

Very cool

> I've attached a patch which implements this. The new node type is
> called "Result Cache". I'm not particularly wedded to keeping that
> name, but if I change it, I only want to do it once. I've got a few
> other names I mind, but I don't feel strongly or confident enough in
> them to go and do the renaming.
>
> How the caching works:
>
> First off, it's only good for plugging in on top of parameterized
> nodes that are rescanned with different parameters. The cache itself
> uses a hash table using the simplehash.h implementation. The memory
> consumption is limited to work_mem. The code maintains an LRU list and
> when we need to add new entries but don't have enough space to do so,
> we free off older items starting at the top of the LRU list. When we
> get a cache hit, we move that entry to the end of the LRU list so that
> it'll be the last to be evicted.
>
> When should we cache:
>
> For nested loop joins, the decision is made purely based on cost.

I thought the main reason to do this was the case when the nested loop
subplan was significantly underestimated and we realize during execution
that we should have built a hash table. So including this based on cost
alone seems to miss a trick.

> The patch does rely heavily on good ndistinct estimates.

Exactly. We know we seldom get those with many-way joins.

So +1 for adding this technique. My question is whether it should be added
as an optional facility of a parameterised sub plan, rather than an
always-needed full-strength node. That way the choice of whether to use it
can happen at execution time once we notice that we've been called too many
times.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
Mission Critical Databases

In response to

Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-05-20 11:44:27 from David Rowley

Responses

Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-05-21 23:53:51 from David Rowley
Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-05-22 00:11:51 from Andy Fan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Vik Fearing	2020-05-20 13:04:20	Re: SEARCH and CYCLE clauses
Previous Message	Atsushi Torikoshi	2020-05-20 12:56:04	Re: Is it useful to record whether plans are generic or custom?