Re: Slowness of extended protocol

From: Shay Rojansky <roji(at)roji(dot)org>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Slowness of extended protocol
Date: 2016-08-11 14:18:17
Message-ID: CADT4RqCkc6cbRfvTQCqgy2N75Z-DA-LHZhPfTJj1e74wxcnr7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 11, 2016 at 1:22 PM, Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:

2) The driver can use safepoints and autorollback to the good "right before
> failure" state in case of a known failure. Here's the implementation:
> https://github.com/pgjdbc/pgjdbc/pull/477
>
As far as I can remember, performance overheads are close to zero (no extra
> roundtrips to create a safepoint)
>

What? Do you mean you do implicit savepoints and autorollback too? How does
the driver decide when to do a savepoint? Is it on every single command? If
not, commands can get lost when an error is raised and you automatically
roll back? If you do a savepoint on every single command, that surely would
impact performance even without extra roundtrips...?

You seem to have a very "unique" idea of what a database driver should do
under-the-hood for its users. At the very least I can say that your concept
is very far from almost any database driver I've seen up to now (PostgreSQL
JDBC, psycopg, Npgsql, libpq...). I'm not aware of other drivers that
implicitly prepare statements, and definitely of no drivers that internally
create savepoints and roll the back without explicit user APIs. At the very
least you should be aware (and also clearly admit!) that you're doing
something very different - not necessarily wrong - and not attempt to
impose your ideas on everyone as if it's the only true way to write a db
driver.

3) Backend could somehow invalidate prepared statements, and notify clients
> accordingly. Note: the problem is hard in a generic case, however it might
> be not that hard if we fix well-known often-used cases like "a column is
> added". That however, would add memory overheads to store additional maps
> like "table_oid -> statement_names[]"
>

Assuming your driver supports batching/pipelining (and I hope it does),
that doesn't make sense. Say I send a 2-statement batch, with the first one
a DDL and with the second one some prepared query invalidated by the first.
When your DDL is executed by PostgreSQL your hypothetical notification is
sent. But the second query, which should be invalidated, has already been
sent to the server (because of batching), and boom.

4) Other. For instance, new message flow so frontend and backend could
> re-negotiate "binary vs text formats for the new resulting type". Or
> "ValidatePreparedStatement" message that would just validate the statement
> and avoid killing the transaction if the statement is invalid. Or whatever
> else there can be invented.
>

When would you send this ValidatePreparedStatement? Before each execution
as a separate roundtrip? That would kill performance. Would you send it in
the same packet before Execute? In that case you still get the error when
Execute is evaluated...

There really is no solution to this problem within the current PostgreSQL
way of doing things - although of course we could reinvent the entire
PostgreSQL protocol here to accommodate for your special driver...

The basic truth is this... In every db driver I'm familiar with programmers
are expected to manage query preparation on their own. They're supposed to
do it based on knowledge only they have, weighing pros and cons. They have
responsibility over their own code and they don't outsource major decisions
like this to their driver. When they get an error from PostgreSQL, it's
triggered by something they did in a very explicit and clear way - and they
therefore have a good chance to understand what's going on. They generally
don't get errors triggered by some under-the-hood magic their driver
decided to do for them, and which are hard to diagnose and understand.
Their drivers are simple, predictable and lean, and they don't have to
include complex healing logic to deal with errors they themselves triggered
with under-the-hood logic (e.g. implicit savepoints).

> Shay>So the general point is that the existence of pools is problematic
> for the argument "always prepare for recurring statements".
>
> So what?
> Don't use pools that issue "discard all" or configure them accordingly.
> That's it.
> In Java world, no wildly used pool defaults to "discard everything"
> strategy.
>

I don't know much about the Java world, but both pgbouncer and pgpool (the
major pools?) send DISCARD ALL by default. That is a fact, and it has
nothing to do with any bugs or issues pgbouncer may have. I'm tempted to go
look at other pools in other languages but frankly I don't think that would
have any effect in this conversation...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-08-11 14:43:03 Re: Logical Replication WIP
Previous Message Jim Nasby 2016-08-11 14:07:27 Re: Heap WARM Tuples - Design Draft