Re: Cached/global query plans, autopreparation

From: Jorge Solórzano <jorsol(at)gmail(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cached/global query plans, autopreparation
Date: 2018-02-15 14:31:09
Message-ID: CA+cVU8PdUD4O7ntwfU2ZASiFTfVvF2BJngYVj9tDY4p+CCLCVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2018 at 8:00 AM, Shay Rojansky <roji(at)roji(dot)org> wrote:

> I am an author of one of the proposal (autoprepare which is in commit fest
>> now), but I think that sooner or later Postgres has to come to solution
>> with shared DB caches/prepared plans.
>> Please correct me if I am wrong, but it seems to me that most of all
>> other top DBMSes having something like this.
>> Such decision can provide a lot of different advantages:
>> 1. Better memory utilization: no need to store the same data N times
>> where N is number of backends and spend time for warming cache.
>> 2. Optimizer can spend more time choosing better plan which then can be
>> used by all clients. Even now time of compilation of some queries several
>> times exceeds time of their execution.
>> 3. It is simpler to add facilities for query plan tuning and maintaining
>> (storing, comparing,...)
>> 4. It make is possible to control size of memory used by caches. Right
>> now catalog cache for DB with hundred thousands and tables and indexes
>> multiplied by hundreds of backends can consume terabytes of memory.
>> 5. Shared caches can simplify invalidation mechanism.
>> 6. Almost all enterprise systems working with Postgres has to use some
>> kind of connection pooling (pgbouncer, pgpool,...). It almost exclude
>> possibility to use prepared statements. Which can slow down performance up
>> to two times.
>>
>
> Just wanted to say I didn't see this email before my previous response,
> but I agree with all of the above. The last point is particularly
> important, especially for short-lived connection scenarios, the most
> typical of which is web.
>
>
>> There is just one (but very important) problem which needs to be solved:
>> access to shared cache should be synchronized.
>> But there are a lot of other shared resources in Postgres (procarray,
>> shared buffers,...). So I do not think that it is unsolvable problem and
>> that it can cause degrade of performance.
>>
>> So it seems to be obvious that shared caches/plans can provide a lot of
>> advantages. But it is still not clear to me the value of this advantages
>> for real customers.
>> Using -M prepared protocol in pgbench workload can improve speed up to
>> two times. But I have asked real Postgres users in Avito, Yandex, MyOffice
>> and them told me
>> that on their workloads advantage of prepared statements is about 10%.
>> 10% performance improvement is definitely not a good compensation for
>> rewriting substantial part of Postgres core...
>>
>
> Just wanted to say that I've seen more than 10% improvement in some
> real-world application when preparation was done properly. Also, I'm
> assuming that implementing this wouldn't involve "rewriting substantial
> part of Postgres core", and that even 10% is quite a big gain, especially
> if it's a transparent/free one as far as the user is concerned (no
> application changes).
>

​10% of improvement in real-world can be pretty significant​, I ignore how
complicated can be to implement this in Postgres core, how about add this
to the GSoC 2018 ideas[1]?

[1] https://wiki.postgresql.org/wiki/GSoC_2018

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2018-02-15 15:37:20 Re: [PATCH][PROPOSAL] Add enum releation option type
Previous Message Amit Kapila 2018-02-15 14:17:54 Re: [HACKERS] why not parallel seq scan for slow functions