Re: Cached/global query plans, autopreparation

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cached/global query plans, autopreparation
Date: 2018-03-02 20:29:09
Message-ID: 20180302202909.GB27946@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2018 at 03:00:17PM +0100, Shay Rojansky wrote:
> Just wanted to say that I've seen more than 10% improvement in some real-world
> application when preparation was done properly. Also, I'm assuming that
> implementing this wouldn't involve "rewriting substantial part of Postgres
> core", and that even 10% is quite a big gain, especially if it's a transparent/
> free one as far as the user is concerned (no application changes).

I would like to step back on this issue. Ideally, every query would get
re-optimized because we can only be sure the plan is optimal when we use
supplied constants to generate the plan. But, of course, parsing and
planning take time, so there ideally would be an way to avoid it. The
question is always how much time will be saved by avoiding
parsing/planning, and what risk is there of suboptimal plans.

Postgres uses a conservative method for reusing plans with previous
constants, as described in the PREPARE manual page:

https://www.postgresql.org/docs/10/static/sql-prepare.html
Prepared statements can use generic plans rather than re-planning with
each set of supplied EXECUTE values. This occurs immediately for prepared
statements with no parameters; otherwise it occurs only after five or more
executions produce plans whose estimated cost average (including planning
overhead) is more expensive than the generic plan cost estimate. Once
a generic plan is chosen, it is used for the remaining lifetime of the
prepared statement. Using EXECUTE values which are rare in columns with
many duplicates can generate custom plans that are so much cheaper than
the generic plan, even after adding planning overhead, that the generic
plan might never be used.

While I have heard people complain about how other databases cache
prepare plans, I have heard few complaints about the Postgres approach,
and I haven't even heard of people asking to control the documented "five
or more" behavior.

I also know that other database products have more sophisticated prepare
usage, but they might have higher parse/plan overhead, or they might be
more flexible in handling specialized workloads, which Postgres might
not want to handle, given the costs/complexity/overhead.

So, the directions for improvement are:

1 Improve the existing "five or more" behavior
2 Automatically prepare queries that are not sent as prepared queries
3 Share plans among sessions

While #1 would be nice, #2 increases the number of applications that can
silently benefit from prepared queries, and #3 improves the number of
cases that query plans can be reused. The issue with #3 is that the
constants used are no longer local to the session (which is the same
issue with connection poolers reusing prepared plans). When different
sessions with potentially more varied constants reuse plans, the
probability of suboptimal plans increases.

I think the fact that pgbench shows a 2x improvement for prepared
statements, and real world reports are a 10% improvement means we need
to have a better understanding of exactly what workloads can benefit
from this, and a comprehensive analysis of all three areas of
improvement.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-02 20:38:43 Re: Cached/global query plans, autopreparation
Previous Message Robert Haas 2018-03-02 20:27:57 Re: Testing "workers launched" in expected output? Really?