Re: Subplan result caching

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Subplan result caching
Date: 2020-04-26 07:07:00
Message-ID: CAKU4AWquaTPwTyik=rj7Nm-Fea6D0yyCwQjBQ-WLR_h9cKUf2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 26, 2020 at 2:11 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > I've been working on a patch to add a little cache to SubPlans, to speed
> up
> > queries with correlated subqueries, where the same subquery is currently
> > executed multiple times with the same parameters. The idea is to cache
> the
> > result of the subplan, with the correlation vars as the cache key.
>
> Hi,
>
> This seems like an interesting area to make improvements, so I've
> signed up to review the patch.
>
> From looking at the code I see that the caching is being done inside
> nodeSubplan.c. I don't think this is the right approach to the
> problem. The problem exists for any parameterized path, so I think a
> more general approach would be much better.
>

I'm +1 on this suggestion. Actually I'm working on the query like this:

select j1o.i, j2_v.sum_5
from j1 j1o
inner join lateral
(select im100, sum(im5) as sum_5
from j2
where j1o.im100 = im100
and j1o.i = 1
group by im100) j2_v
on true
where j1o.i = 1;

I hope the we have a result cache for j2_v. But this nothing with SubPlan.

If we want to handle this case as well, one of the changes would
be it needs to cache multi records for one input parameter, or return
one row each time but return mutli times for one input parameter,
Tuplestore may be a good option for this case since its full functionalities
like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
the next question is how to control the memory usage for this Node.
We can use the dedicated memory context to know how many memory
this node used in total, but we can't stop the tuplestore from using more
memory. Or we can force set both current tuplestore->state to
TTS_WRITEFILE
and set the allowedMem to 0 for the following tuplestore, after we find too
memory is used. However this looks a bit of hack.

As a summary of my thinking about this topic before, I'd write another
incomplete options to handle this case. 1) we just use the the Material
Path to cache the result for last parameter only. if the following value
is
same as the last one, we use it. Or else, we rescan the Material path.
2). We can consider order the rows by input params key. That will be
much cache friendly.

BTW, I see may node->chgParams was changed without checking if the
datum changed from the previous one, this may lost some optimization
opportunities (for example during the agg node rescan case, if the params
is not changed, the we use the hash table we build before). So if we we
add new parameter to node->chgParams only if the datum really changed,
will it still be semantic correctly.

It's great to see someone working on this.
>

I'd like to have a try.

Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 曾文旌 2020-04-26 08:09:02 Re: [Proposal] Global temporary tables
Previous Message Jesse Zhang 2020-04-26 04:41:20 Fix compilation failure against LLVM 11