Re: Cached plans and statement generalization

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 19:41:45
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/11/2017 06:12 PM, Bruce Momjian wrote:
> On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
>> I am going to continue work on this patch I will be glad to receive any
>> feedback and suggestions for its improvement.
>> In most cases, applications are not accessing Postgres directly, but using
>> some connection pooling layer and so them are not able to use prepared
>> statements.
>> But at simple OLTP Postgres spent more time on building query plan than on
>> execution itself. And it is possible to speedup Postgres about two times at
>> such workload!
>> Another alternative is true shared plan cache. May be it is even more
>> perspective approach, but definitely much more invasive and harder to
>> implement.
> Can we back up and get an overview of what you are doing and how you are
> doing it? Our TODO list suggests this order for successful patches:
> Desirability -> Design -> Implement -> Test -> Review -> Commit
> You kind of started at the Implementation/patch level, which makes it
> hard to evaluate.
> I think everyone agrees on the Desirability of the feature, but the
> Design is the tricky part. I think the design questions are:
> * What information is stored about cached plans?
> * How are the cached plans invalidated?
> * How is a query matched against a cached plan?
> Looking at the options, ideally the plan would be cached at the same
> query stage as the stage where the incoming query is checked against the
> cache. However, caching and checking at the same level offers no
> benefit, so they are going to be different. For example, caching a
> parse tree at the time it is created, then checking at the same point if
> the incoming query is the same doesn't help you because you already had
> to create the parse tree get to that point.
> A more concrete example is prepared statements. They are stored at the
> end of planning and matched in the parser. However, you can easily do
> that since the incoming query specifies the name of the prepared query,
> so there is no trick to matching.
> The desire is to cache as late as possible so you cache more work and
> you have more detail about the referenced objects, which helps with
> cache invalidation. However, you also want to do cache matching as
> early as possible to improve performance.
> So, let's look at some options. One interesting idea from Doug Doole
> was to do it between the tokenizer and parser. I think they are glued
> together so you would need a way to run the tokenizer separately and
> compare that to the tokens you stored for the cached plan. The larger
> issue is that prepared plans already are checked after parsing, and we
> know they are a win, so matching any earlier than that just seems like
> overkill and likely to lead to lots of problems.
> So, you could do it after parsing but before parse-analysis, which is
> kind of what prepared queries do. One tricky problem is that we don't
> bind the query string tokens to database objects until after parse
> analysis.
> Doing matching before parse-analysis is going to be tricky, which is why
> there are so many comments about the approach. Changing search_path can
> certainly affect it, but creating objects in earlier-mentioned schemas
> can also change how an object reference in a query is resolved. Even
> obscure things like the creation of a new operator that has higher
> precedence in the query could change the plan, though am not sure if
> our prepared query system even handles that properly.
> Anyway, that is my feedback. I would like to get an overview of what
> you are trying to do and the costs/benefits of each option so we can
> best guide you.
Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I come to conclusion that this
optimization can be really very useful and now try to find the best way of its implementation.

I have started with simplest approach when string literals are replaced with parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup most of the existed queries without rewriting them.

This is why I have provided second implementation which replace literals with parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant advantage in performance: more than two times at pgbench.
Then I tried to run regression tests and find several situations where type analysis is not correctly performed in case of replacing literals with parameters.

So my third attempt is to replace constant nodes with parameters in already analyzed tree.

Now answering your questions:

* What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

* How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

* How is a query matched against a cached plan?

Hash function is calculated for raw parse tree and equal() function is used to compare raw plans with literal nodes replaced with parameters.

Konstantin Knizhnik
Postgres Professional:
The Russian Postgres Company

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-05-11 19:48:26 Re: Cached plans and statement generalization
Previous Message Alvaro Herrera 2017-05-11 19:32:36 Re: WITH clause in CREATE STATISTICS