Re: Subplan result caching

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Subplan result caching
Date: 2018-07-09 09:08:07
Message-ID: CAKJS1f-kAk1cGVvzg9TXCLhPsxx_oFVOrTGSR5yTRXKWntTVFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.

Hi,

This seems like an interesting area to make improvements, so I've
signed up to review the patch.

From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.

We already have Materialize nodes to cache the results of an entire
subplan, and this seems to have quite a bit in common with that, only
we'd want to cache multiple results with a key to determine which
result set should be returned. Due to the similarities with
Materialize, I think that the cache should be a node itself and not
bury the cache logic in some other node type that's meant for some
other purpose.

"LazyMaterialize" seems like a good option for a name. It seems better
than "LazyHash" since you may not want to restrict it to a hash table
based cache in the future. A binary search tree may be a good option
for types that cannot be hashed.

Materialize nodes are injected above the inner side node of MergeJoins
based on cost, so I think this node type could just do the same. Maybe
something like estimate_num_groups(<exprs being compared to params>) /
path->rows is below some defined constant, perhaps something like 0.5.
Although experimentation would be required. It might be good to take
into account some other cost factors too.

I imagine we'd want to only allow this optimisation for hashjoinable
types. This seems pretty natural since your cache implementation is a
hash table, so, of course, we're going to need a hash function.

Wondering your thoughts on this idea.

I'll mark as waiting on author in the meantime.

It's great to see someone working on this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-07-09 09:14:13 Re: Non-reserved replication slots and slot advancing
Previous Message Kyotaro HORIGUCHI 2018-07-09 09:07:24 Re: Let's remove DSM_IMPL_NONE.