Re: Cached plans and statement generalization

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 15:11:09
Message-ID: f0bfcdd6-dba3-e9a8-2108-146e1f880839@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 24.04.2017 21:43, Andres Freund wrote:
> Hi,
>
> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
>> So what I am thinking now is implicit query caching. If the same query with
>> different literal values is repeated many times, then we can try to
>> generalize this query and replace it with prepared query with
>> parameters.
> That's not actuall all that easy:
> - You pretty much do parse analysis to be able to do an accurate match.
> How much overhead is parse analysis vs. planning in your cases?
> - The invalidation infrastructure for this, if not tied to to fully
> parse-analyzed statements, is going to be hell.
> - Migrating to parameters can actually cause significant slowdowns, not
> nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase
performance almost two times.

My current implementation is replacing with parameters only string
literals in the query, i.e. select * from T where x='123'; -> select *
from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to
locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly
understand whether minus should be treated as part of literal or as
operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where
parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the
query, then transform parse tree, replacing literal nodes with parameter
nodes and finally deparse tree into generalized query. postgres_fdw
already contains such deparse code. It can be moved to postgres core and
reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time
needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if
currently only substitution of string literals is supported and queries
constructed by pgbench don't contain string literals? I just made small
patch in pgbench replaceVariable method wrapping value's representation
in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch
(based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is
considered to be useful, I will continue work on this patch.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare.patch text/x-patch 18.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2017-04-25 15:13:08 Re: Separation walsender & normal backends
Previous Message Andrew Dunstan 2017-04-25 15:09:39 Re: TAP tests - installcheck vs check