Re: Cached/global query plans, autopreparation

From: Shay Rojansky <roji(at)roji(dot)org>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cached/global query plans, autopreparation
Date: 2018-02-15 14:00:17
Message-ID: CADT4RqCvzjyV5sQy_cjR7ys3LA0rS4U5APRrqk0j1phofBw+gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> I am an author of one of the proposal (autoprepare which is in commit fest
> now), but I think that sooner or later Postgres has to come to solution
> with shared DB caches/prepared plans.
> Please correct me if I am wrong, but it seems to me that most of all other
> top DBMSes having something like this.
> Such decision can provide a lot of different advantages:
> 1. Better memory utilization: no need to store the same data N times where
> N is number of backends and spend time for warming cache.
> 2. Optimizer can spend more time choosing better plan which then can be
> used by all clients. Even now time of compilation of some queries several
> times exceeds time of their execution.
> 3. It is simpler to add facilities for query plan tuning and maintaining
> (storing, comparing,...)
> 4. It make is possible to control size of memory used by caches. Right now
> catalog cache for DB with hundred thousands and tables and indexes
> multiplied by hundreds of backends can consume terabytes of memory.
> 5. Shared caches can simplify invalidation mechanism.
> 6. Almost all enterprise systems working with Postgres has to use some
> kind of connection pooling (pgbouncer, pgpool,...). It almost exclude
> possibility to use prepared statements. Which can slow down performance up
> to two times.
>

Just wanted to say I didn't see this email before my previous response, but
I agree with all of the above. The last point is particularly important,
especially for short-lived connection scenarios, the most typical of which
is web.

> There is just one (but very important) problem which needs to be solved:
> access to shared cache should be synchronized.
> But there are a lot of other shared resources in Postgres (procarray,
> shared buffers,...). So I do not think that it is unsolvable problem and
> that it can cause degrade of performance.
>
> So it seems to be obvious that shared caches/plans can provide a lot of
> advantages. But it is still not clear to me the value of this advantages
> for real customers.
> Using -M prepared protocol in pgbench workload can improve speed up to
> two times. But I have asked real Postgres users in Avito, Yandex, MyOffice
> and them told me
> that on their workloads advantage of prepared statements is about 10%. 10%
> performance improvement is definitely not a good compensation for rewriting
> substantial part of Postgres core...
>

Just wanted to say that I've seen more than 10% improvement in some
real-world application when preparation was done properly. Also, I'm
assuming that implementing this wouldn't involve "rewriting substantial
part of Postgres core", and that even 10% is quite a big gain, especially
if it's a transparent/free one as far as the user is concerned (no
application changes).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-02-15 14:17:54 Re: [HACKERS] why not parallel seq scan for slow functions
Previous Message Shay Rojansky 2018-02-15 13:54:14 Re: Cached/global query plans, autopreparation