Re: Slowness of extended protocol

From: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, sfrost(at)snowman(dot)net, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de, stark(at)mit(dot)edu, ishii(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Slowness of extended protocol
Date: 2016-08-16 14:16:45
Message-ID: CAB=Je-GOTaqp8gyv2jdP=AB_EqmkfNMwkbWyOr2_hOHXJYGu3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shay> your analogy breaks down. Of course L2 was invented to improve
performance,
Shay> but that doesn't mean that all caches are the same. More precisely, what I
Shay> find objectionable about your approach is not any caching - it's the
Shay> implicit or automatic preparation of statements. This practice isn't
Shay> invisible in that a) it may cause errors that wouldn't have been there
Shay> otherwise (e.g. because of DDL),

Long-lived named server-prepared statements cause problems even if
server-prepared statements are created manually by developers.

Could you please stop saying "automatic preparation causes ~DDL issues"?

Those errors are not inherent to "automatic preparation of statements"
Those are just database defects that need to be cured.

Automatic savepointing is just a workaround for current DB limitation, and
it provides users with a simplified migration path.

Please, don't try to tell me that "IDbCommand.Prepare()" documentation says
that "prepared statement might fail for no reason just because it is prepared".

Shay> As I said above, I think this is a critical point of misunderstand between
Shay> us. The developers tells the driver which statements should be
Shay> server-prepared by calling .prepareStatement(). I'm guessing you have a
Shay> totally different understanding here.

Please, quote the document you got that "developers tell the driver which
statements should be server-prepared by calling ..." from. It never
works like that.
Neither in Java, nor in C#. I would admit I've no C# experience, but I did
find documentation on IDbCommand.Prepare() and examined it.

The proper way to say is "by calling .Prepare() developer passes the
intention that
he might be using the same query multiple times".
That is it. It never means "driver must absolutely use server-prepare
in the response
to .Prepare() call".

The same goes for Java's PreparedStatement.
It never means "the driver must use server-prepared features".

As Microsoft lists in the .Prepare() documentation, modern versions of
MSSQL just ignore .Prepare() and cache statements automatically.

It is not a developer's business which statements should be in the
database cache.
Neither developer should care which statements reside in the driver cache.

Shay> What exactly does "server-prepare on each execution" means? Sending Parse
Shay> on each execution? How can that be considered prepared at all?

Remember, java.sql.PreparedStatement interface is NOT bound to PostgreSQL in any
manner. It is a generic database API.
Thus the word "prepared" does not mean anything specific there.
It gives no promise whether the statement will use "parseOnce,
execMany" PostgreSQL's
feature or not.

A compliant implementation (that is a driver) could just assemble full SQL
by concatenating the parameters on each execution and send it via 'Q' simple
execute message.

Shay> Does pgjdbc consider
Shay> something "prepared" without it being the 2nd option above? Note that I'm
Shay> genuinely interested in case I'm missing something.

Currently the first 5 executions of PreparedStatement use unnamed
statements (Parse/Bind/Exec).
Then pgjdbc assigns a name and uses just Bind/Exec.

So if a particular SQL is rare, then it would not get its own
server-prepared name
even though it is "java.sql.PreparedStatement".

What pgjdbc does is it picks the most used queries and enables them to be cached
at the database level.

Vladimir>> app/component and assign variables to CPU registers.
Vladimir>> This is exactly "programmer knowledge" which the compiler
doesn't have.
Vladimir>> Does it sound good to you?

Shay> Of course not. But I don't think it's a very valid analogy.

The analogy was not supposed to play in a way
you twisted it with ORM=Java, driver=C, etc.

Here's more detailed explanation:

1) You claim that programmers should manually examine all the SQL statements,
and put ".prepare()" call if and only if the specific SQL should be
server-prepared.

2) My analogy: programmers should manually examine all the variables
(think of C#
variables, or Java variables, or C variables, it does not matter),
and assign which variables should use CPU registers, and which ones should
go into the memory.

Named server-prepared statements == CPU registers
SQL statements in the code == variables in the code (e.g. C# variables)

That is very valid analogy. What you say is "programmer has full visibility over
the meaning of the code, thus it knows better which statements should be
server-prepared and which should not".

Well, register allocation is a bit harder problem that "statement name
allocation",
but the essence is the same: there's limited number of registers/named
statements,
so someone (or something) should decide which statements deserve a name.

Just in case: you definitely know what CPU registers are and what is
"register allocation" problem, don't you?

You seem to pick up that "application developer != compiler engineer", however
then you slipped into "nevertheless application developer should
decide if each SQL
should be server-prepared on a case-by-case basis".

In ideal world that might work. However, that does not work in practice:

0) Microsoft's documentation on ".Prepare()" says "prepare call does not have
any effect since sql server optimizes the statements automatically".
So applications ported from MSSQL might miss ".Prepare()" altogether,
thus it will
under-perform and blame PostgreSQL for no reason.
Does it sound fair? I do not think so.

Do you see lots of complaints for "MSSQL doing automatic statement caching"?
I'm not into MSSQL, but I know Oracle does automatic statement caching as well,
and no one ever raises questions like "my precious statement was evicted".

1) What if the database RAM is increased, so the application can
".prepare()" not 20, but
2000 statements?
Should developer add more ".prepare()" calls and recompile the application
just to accommodate RAM increase? That sounds crazy. I think the way
to go should be
just increase "statement cache size".

2) Do you really expect each project to have a list of all the used SQL along
with "should server-prepare" flags?

3) What if the application is composed of 5 components? How should one split
the budget of server-prepared statements? (remember, we cannot prepare
all of them
otherwise the DB would crash)

4) Database driver does see all the SQLs executed within a connection.
Of course, driver's automatic cache cannot beat "ideal pick", however
that automatic statement can easily be "just fine" and
automatic caching saves enormous amount of development time.

Remember, C# compiler often is "just fine" at CPU register allocation,
and you don't need to control how the code is actually executed.

That spare time would better be spent on SQL tuning (that is "explain
analyze...")
rather than deciding which statements should be server-prepared and
which should not.

5) When porting from other database, the application might already be full of
"con.prepareStatement(String sql)" calls. As I said, there is NO
NonPreparedStatement
in Java.
Does that mean an application ported from say Oracle should crash when running
against PostgreSQL? That does not sound right.

Shay> Whereas it's totally
Shay> unreasonable to expect programmers to do the work of the compiler
Shay> optimizer, it seems really reasonable to ask them to think about server
Shay> preparation, and even about multivalue inserts (at least if performance is
Shay> important in the application).

a) They don't even know that multivalues in PostgreSQL gives such a boost
b) By the time all the ORMs implement that feature, PostgreSQL might
fix that defect
so the optimization will become obsolete.

Once again: I do not suggest rewriting SELECTs on the fly yet.
What I suggest is to decide "which are the most important SQLs and which of them
deserve their own server-prepared name" at the driver level.

Remember: it is very close to what compiler is doing. It assigns
registers automatically
based on some heuristics. For typical applications it is just fine.
No-one is trying to

Shay> and b) it imposes a resource drain on the
Shay> server. The second point is very important: the L2 cache doesn't impose a
Shay> resource drain on anyone - it's just there, speeding up your
application. I
Shay> hope that point makes it across - it's why I don't accept your analogy.

Oh. You might be not into a chip design either, that is why that
analogy plays so bad.
Just in case: if L2 cache was just there, and if L2 cache didn't
impose a resource
drain, why don't we have 1GiB L2 caches?

Of course L2 drains lots of resources:
1) It drains money, as L2 costs valuable on-die space
2) L2 drains power, that is naturally drains your battery
Of course there are good reasons to have L2 even though it drains resources, but
the statement of "it is just there for free" is plain wrong.

Shay> I'm still
Shay> not sure if it should be opt-in or default, although I admit I'm leaning
Shay> towards default. But that feature has very little to do with *implicit*
Shay> preparation.

Shay> The point is that AFAIK the same bugs that can result from reordering can
Shay> also result from other basic conditions as well. If you're writing
Shay> multithreaded code then you must handle synchronization - this is not a
Shay> reordering-specific problem. Therefore if your program is
multithreaded but
Shay> doesn't do proper synchronization you have a bug - regardless of whether
Shay> its manifestation is triggered by CPU reordering or not. I admit
I'm not an
Shay> expert on this and may be wrong (it would be interesting to know).

Regarding "regardless...by CPU or not", you are off.
I've listed a link to a concurrency bug in Linux kernel.
It manifested itself only on some recent Intel CPUs.
In other words, the kernel was fine otherwise, but if running modern
CPU, it might hang.
Of course it was bug in the kernel, but the trigger condition was
"smarter CPU" that
was able to do more "reorderings" that were possible with older ones.

There are good articles by Alexey Shipilev where he describes what are
the typical
concurrency problems, and why do compiler engineers allow them to appear.
The articles are java-related, however I think they are good read anyway:
https://shipilev.net/blog/2014/jmm-pragmatics
https://shipilev.net/blog/2016/close-encounters-of-jmm-kind

Shay> It does only if you do it in a roundtrip of its own. When you close a
Shay> pooled connection in Npgsql, the reset query is written to an internal
Shay> buffer but not sent. The first query that actually gets sent by the user
Shay> after opening will therefore have the reset query prepended to it
Shay> (basically the reset query is batched)

That is clever.
Is it something specific to Npgsql pool? Does it work with all poolers
that operate
on top of Npgsql connections?

Vladimir

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-16 14:16:56 Re: Let's get rid of the separate minor version numbers for shlibs
Previous Message Stephen Frost 2016-08-16 14:06:39 Re: patch proposal