Re: Slowness of extended protocol

From: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Slowness of extended protocol
Date: 2016-08-09 15:42:16
Message-ID: CAB=Je-FP1rM14fx20CZ62GP0jS9aGJiuS9GRVLrCvTPuE3UPjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shay>But here's the more important general point. We're driver developers,
not application developers. I don't really know what performance is "just
fine" for each of my users, and what is not worth optimizing further. Users
may follow best practices, or they may not for various reasons.

Of course we cannot benchmark all the existing applications, however we
should at lest try to use "close to production" benchmarks.

Let me recap: even "select 1" shows clear advantage of reusing
server-prepared statements.
My machine shows the following results for "select 1 pgbench":
simple: 38K ops/sec (~26us/op)
extended: 32K ops/sec (~31us/op)
prepared: 47K ops/sec (~21us/op)

Note: reusing server-prepared statements shaves 10us (out of 31us), while
"brand new ParseBindExecDeallocate" message would not able to perform
better than 26us/op (that is 5 us worse than the prepared one). So it makes
much more sense investing time in "server-prepared statement reuse" at the
client side and "improving Bind/Exec performance" at the backend side.

For more complex queries the gap (prepared vs simple) would be much bigger
since parse/validate/plan for a complex query is much harder operation than
the one for "select 1"

Note: I do not mean "look, prepared always win". I mean: "if your
environment does not allow reuse of prepared statements for some reason,
you lose huge amount of time on re-parsing the queries, and it is worth
fixing that obvious issue first".

Shay>I don't see how reusing SQL text affects security in any way.

Reusing SQL text makes application more secure as "build SQL on the fly" is
prone to SQL injection security issues.
So reusing SQL text makes application more secure and it enables
server-prepared statements that improve performance considerably. It is a
win-win.

Shay>a new feature in the Npgsql dev branch which allows prepared
statements to be persisted across open/close on pooled connections

Do you have some benchmark results for "reusing server-prepared statements
across open/close on pooled"? I would expect that feature to be a great win.

Once again, I'd like to focus on real-life cases, not artificial ones.

For example, the major part of my performance fixes to pgjdbc were made as
a part of improving my java application that was suffering from performance
issues when talking to PostgreSQL.
For instance, there were queries that took 20ms to plan and 0.2ms to
execute (the query is like where id=? but the SQL text was more involved).
As transparent server-side statement was implemented at pgjdbc side, it
shaved those 20ms by eliminating Parse messages on the hot path.

In other words, it was not just "lets optimize pgjdbc". It was driven by
the need to optimize the client application, and the profiling results were
pointing to pgjdbc issues.

Shay>Again, in a world where prepared statements aren't persisted across
connections (e.g. pgbouncer)

pgbouncer does not properly support named statements, and that is
pbgouncer's issue.

Here's the issue for pgbouncer project:
https://github.com/pgbouncer/pgbouncer/issues/126#issuecomment-200900171
The response from pgbouncer team is "all the protocol bits are there, it is
just implementation from pgbouncer that is missing".

By the way: I do not use pgbouncer, thus there's no much interest for me to
invest time in fixing pgbouncer's issues.

Shay>Any scenario where you open a relatively short-lived connection and
execute something once is problematic - imagine a simple web service which
needs to insert a single record into a table.

I would assume the application does not use random string for a table name
(and columns/aliases), thus it would result in typical SQL text reuse, thus
it should trigger "server-side statement prepare" logic. In other way, that
kind of application does not need the "new ParseBindExecDeallocate message
we are talking about".

In other words, if an application is using "select name from objects where
id=$1" kind of queries, the driver should be using extended protocol
(Bind/Exec) behind the scenes if it does aim to get high performance.

Vladimir

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-08-09 15:56:23 Re: No longer possible to query catalogs for index capabilities?
Previous Message Tom Lane 2016-08-09 15:26:29 Re: No longer possible to query catalogs for index capabilities?