Re: Cached/global query plans, autopreparation

From: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cached/global query plans, autopreparation
Date: 2018-03-03 06:22:26
Message-ID: E95A4A8A-8E0F-40D9-9884-83DA238B1EBE@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mar 2, 2018, at 11:29 PM, Bruce Momjian wrote:

> On Thu, Feb 15, 2018 at 03:00:17PM +0100, Shay Rojansky wrote:
>> Just wanted to say that I've seen more than 10% improvement in some real-world
>> application when preparation was done properly. Also, I'm assuming that
>> implementing this wouldn't involve "rewriting substantial part of Postgres
>> core", and that even 10% is quite a big gain, especially if it's a transparent/
>> free one as far as the user is concerned (no application changes).
>
> I would like to step back on this issue. Ideally, every query would get
> re-optimized because we can only be sure the plan is optimal when we use
> supplied constants to generate the plan. But, of course, parsing and
> planning take time, so there ideally would be an way to avoid it. The
> question is always how much time will be saved by avoiding
> parsing/planning, and what risk is there of suboptimal plans.
>
> Postgres uses a conservative method for reusing plans with previous
> constants, as described in the PREPARE manual page:
>
> https://www.postgresql.org/docs/10/static/sql-prepare.html
> Prepared statements can use generic plans rather than re-planning with
> each set of supplied EXECUTE values. This occurs immediately for prepared
> statements with no parameters; otherwise it occurs only after five or more
> executions produce plans whose estimated cost average (including planning
> overhead) is more expensive than the generic plan cost estimate. Once
> a generic plan is chosen, it is used for the remaining lifetime of the
> prepared statement. Using EXECUTE values which are rare in columns with
> many duplicates can generate custom plans that are so much cheaper than
> the generic plan, even after adding planning overhead, that the generic
> plan might never be used.
>
> While I have heard people complain about how other databases cache
> prepare plans, I have heard few complaints about the Postgres approach,
> and I haven't even heard of people asking to control the documented "five
> or more" behavior.

I am sorry, but I think that lack of complaints in this case just mean that people are not using prepared statements in Postgres.
Almost any productional system baed on Postgres has to use some kind of connection pooling (pgbouncer...) and it prevents usage of prepared statements in most cases.
I personally was involved in development in several such system where it was prohibited to use prepared statements because all connections to the database are expected to be done through pgbouncer.

I completely agree with you that general plan can be less efficient than specialized plan. It can be caused by data skew and non-uniform data distribution.
But what is the percent of such queries? I do not have much experience with investigating behavior of Postgres in real production systems.
So I can only guess. In the simplest queries like "select * from T where key=?" it is very inlikely that specialized plan will be more efficient, even in case of non-uniform distribution of "key" values. And for complex plans involving multiples joins Postgres optimizer in any case will not be able to calculate more or less precise estimation after few joins. So in this case
specialized plan with literals instead of placeholders once again will not be more efficient.

This rule with five attempts before switching to generic plan works well for partitioning where generic plan cause query execution at all partition and so is several times less efficient than specialized plan which is able to restrict query execution just to one partition. But I think that in this case right solution is runtime partitioning pruning which will allow to use prepared statements for tables with inheritance.

>
> I also know that other database products have more sophisticated prepare
> usage, but they might have higher parse/plan overhead, or they might be
> more flexible in handling specialized workloads, which Postgres might
> not want to handle, given the costs/complexity/overhead.
>
> So, the directions for improvement are:
>
> 1 Improve the existing "five or more" behavior
> 2 Automatically prepare queries that are not sent as prepared queries
> 3 Share plans among sessions
>
> While #1 would be nice, #2 increases the number of applications that can
> silently benefit from prepared queries, and #3 improves the number of
> cases that query plans can be reused. The issue with #3 is that the
> constants used are no longer local to the session (which is the same
> issue with connection poolers reusing prepared plans). When different
> sessions with potentially more varied constants reuse plans, the
> probability of suboptimal plans increases.
>
> I think the fact that pgbench shows a 2x improvement for prepared
> statements, and real world reports are a 10% improvement means we need
> to have a better understanding of exactly what workloads can benefit
> from this, and a comprehensive analysis of all three areas of
> improvement.

I completely agree with you.
If we have autoprepared statements or shared plans then more sophisticated checks for efficiency of generic query is definitely needed.
Just want to add few words about estimation of complexity of all this approaches.

Autoprepare patch already exists and can improve performance on OLTP workload up to two times.
It will be nice to perform more experiments on real system to estimate real effect of autoprepare. But until this patch is not included in Postgres, it will be hard to perform such experiments. It is very difficult to try effect of prepared statements at real system working through pgbouncer.
And with autoprepare it will be enough just to toggle one GUC variable to perform this experiment.

Implementation of shard cache is definitely the most universal and efficient solution. But its implementation will require a lot of efforts and even if we decided to more in this direction,
I do not think that it will be ready till 11 or even 12 release of Postgres.

It's a pity that we have to loose up to two times of possible Postgres performance just because of lack of any suitable mechanism for preparing queries.

>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2018-03-03 07:42:08 Re: WIP Patch: Precalculate stable functions, infrastructure v1
Previous Message Tom Lane 2018-03-03 05:48:26 Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling