Re: Cached/global query plans, autopreparation

From: Shay Rojansky <roji(at)roji(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached/global query plans, autopreparation
Date: 2018-02-13 17:13:09
Message-ID: CADT4RqAUwvJLtwsKFrp-xCvJXhXOYRkvb7s1aNfp79Yy21cqtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Was wondering if anyone has a reaction to my email below about statement
preparation, was it too long? :)

(and sorry for top-posting)

On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky <roji(at)roji(dot)org> wrote:

> Hi all.
>
> Various versions of having PostgreSQL caching and/or autopreparing
> statement plans have been discussed (https://www.postgresql.org/
> message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com, https:/
> /www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-
> e9cf193a74f5%40postgrespro.ru), without clear conclusions or even an
> agreement on what might be worthwhile to implement. I wanted to bring this
> up again from a PostgreSQL driver maintainer's perspective (I'm the owner
> of Npgsql, the open source .NET driver), apologies in advance if I'm
> repeating things or I've missed crucial information. Below I'll describe
> three relevant issues and what I've done to deal with them.
>
> When the same statement is rerun, preparing it has a very significant
> performance boost. However, in short-lived connection scenarios it's
> frequently not possible to benefit from this - think of a typical webapp
> which allocates a connection from a pool, run a query and then return the
> connection. To make sure prepared statements are used, Npgsql's connection
> pool doesn't send DISCARD ALL when a connection is returned (to avoid
> wiping out the connections), and maintains an internal table mapping SQL
> (and parameter types) to a PostgreSQL statement name. The next time the
> application attempts to prepare the same SQL, the prepared statement is
> found in the table and no preparation needs to occur. This means that
> prepared statements persist across pooled connection open/close, and are
> never discarded unless the user uses a specific API. While this works, the
> disadvantages are that:
> 1. This kind of mechanism needs to be implemented again and again, in each
> driver:
> 2. It relies on Npgsql's internal pooling, which can track persistent
> prepared statements on physical connections. If an external pool is used
> (e.g. pgpool), this isn't really possible.
> 1. It complicates resetting the session state (instead of DISCARD ALL, a
> combination of all other reset commands except DEALLOCATE ALL needs be
> sent). This is minor.
>
> The second issue is that many applications don't work directly against the
> database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
> additional layer is used, there's a good chance that that layer doesn't
> prepare in any way, and indeed hide your access to the database API's
> preparation method. Two examples from the .NET world is dapper (a very
> popular micro-O/RM) and Entity Framework. In order to provide the best
> possible performance in these scenarios, Npgsql has an opt-in feature
> whereby it tracks how many times a given statement was executed, and once
> it passes a certain threshold automatically prepares it. An LRU cache is
> then used to determine which prepared statements to discard, to avoid
> explosion. In effect, statement auto-preparation is implemented in the
> driver. I know that the JDBC driver also implements such a mechanism (it
> was actually the inspiration for the Npgsql feature). The issues with this
> are:
>
> 1. As above, this has to be implemented by every driver (and is quite
> complex to do well)
> 2. There's a possible missed opportunity in having a single plan on the
> server, as each connection has its own (the "global plan" option). Many
> apps out there send the same statements across many connections so this
> seems relevant - but I don't know if the gains outweigh the contention
> impact in PostgreSQL.
>
> Finally, since quite a few (most?) other databases include autopreparation
> (SQL Server, Oracle...), users porting their applications - which don't
> explicitly prepare - experience a big performance drop. It can rightly be
> said that porting an application across databases isn't a trivial task and
> that adjustments need to be made, but from experience I can say that
> PostgreSQL is losing quite a few users to this.
>
> The above issues could be helped by having PostgreSQL cache on its side
> (especially the second issue, which is the most important). Ideally, any
> adopted solution would be transparent and not require any modification to
> applications. It would also not impact explicitly-prepared statements in
> any way.
>
> Note that I'm not arguing for any specific implementation on the
> PostgreSQL side (e.g. global or not), but just describing a need and hoping
> to restart a conversation that will lead somewhere.
>
> (and thanks for reading this overly long message!)
>
> Shay
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2018-02-13 17:24:27 Re: [GSoC Idea Discussion] "Thrift datatype support" Project
Previous Message Bear Giles 2018-02-13 16:54:33 Re: [GSoC Idea Discussion] "Thrift datatype support" Project