Re: Subplan result caching

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Subplan result caching
Date: 2020-04-26 12:35:31
Message-ID: CAKU4AWq3rARimEGKnVfyp6BgcBT4S1gTgwtie_JHihSSZdrnUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 26, 2020 at 5:49 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > If we want to handle this case as well, one of the changes would
> > be it needs to cache multi records for one input parameter, or return
> > one row each time but return mutli times for one input parameter,
> > Tuplestore may be a good option for this case since its full
> functionalities
> > like tuple_puttuple, tuple_gettuple. But if we implement it with
> tuplestore,
> > the next question is how to control the memory usage for this Node.
> > We can use the dedicated memory context to know how many memory
> > this node used in total, but we can't stop the tuplestore from using more
> > memory. Or we can force set both current tuplestore->state to
> TTS_WRITEFILE
> > and set the allowedMem to 0 for the following tuplestore, after we find
> too
> > memory is used. However this looks a bit of hack.
>
> I didn't imagine a tuplestore would be that useful for this. A node
> like this will do its best work when the ratio of n_values /
> distinct_values of the parameters is high. The planner can often not
> be that great at knowing the number of distinct values, especially so
> when there is more than one expression to estimate the number of
> distinct values for. (we added extended statistics to try to help with
> that). I think this node will do its best when the time spent for a
> cache miss it bearly any more expensive than scanning the subnode to
> get the results. If we can do that then we'll see fewer regressions
> for when we inject one of these nodes where it'll do no good, e.g when
> we'll never get a repeated value. If we start spilling these tuples
> out to disk then it adds overhead which might never pay off.
>
> I'd suggest a hash table to act as an MRU cache. We'd just evict old
> values when we run out of space, i.e consume all of work_mem.
>
> I've got a bunch of code locally which is still a work in progress to
> do this. I'll finish it off and post it here.

I was feeling that we may have to maintain some extra status if we use hash
table rather than tuple store, but that might be not a major concern. I can
wait and see your patch.

Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-04-26 16:13:24 Re: doc review for v13
Previous Message Peter Eisentraut 2020-04-26 12:01:04 Re: Setting min/max TLS protocol in clientside libpq