Re: why do we need two snapshots per query?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why do we need two snapshots per query?
Date: 2011-11-14 00:37:29
Message-ID: CA+TgmoYJKfnMrtMhODwhNoj1jwcgzs_H1R70erCEcrWJM65DUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 13, 2011 at 6:45 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Nov14, 2011, at 00:13 , Robert Haas wrote:
>> On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> In that case you must be of the opinion that extended query protocol
>>> is a bad idea and we should get rid of it, and the same for prepared
>>> plans of all types.  What you're basically proposing is that simple
>>> query mode will act differently from other ways of submitting a query,
>>> and I don't think that's a good idea.
>>
>> I don't see why anything I said would indicate that we shouldn't have
>> prepared plans.  It is useful for users to have the option to parse
>> and plan before execution - especially if they want to execute the
>> same query repeatedly - and if they choose to make use of that
>> functionality, then we and they will have to deal with the fact that
>> things can change between plan time and execution time.
>
> The problem, or at least what I perceived to be the problem, is that
> protocol-level support for prepared plans isn't the only reason to use
> the extended query protocol. The other reasons are protocol-level control
> over text vs. binary format, and out-of-line parameters.
>
> In my experience, it's hard enough as it is to convince developers to
> use statement parameters instead of interpolating them into the SQL
> string. Once word gets out that the simple protocol is now has less locking
> overhead than the extended protocol, it's going to get even harder...

Well, if our goal in life is to allow people to have protocol control
over text vs. binary format and support out-of-line parameters without
requiring multiple protocol messages, we can build that facility in to
the next version of the protocol. I know Kevin's been thinking about
working on that project for a number of reasons, and this would be a
good thing to get on the list.

On the other hand, if our goal in life is to promote the extended
query protocol over the simple query protocol at all costs, then I
agree that we shouldn't optimize the simple query protocol in any way.
Perhaps we should even post a big notice on it that says "this
facility is deprecated and will be removed in a future version of
PostgreSQL". But why should that be our goal? Presumably our goal is
to put forward the best technology, not to artificially pump up one
alternative at the expense of some other one. If the simple protocol
is faster in certain use cases than the extended protocol, then let
people use it. I wouldn't have noticed this optimization opportunity
in the first place but for the fact that psql seems to use the simple
protocol - why does it do that, if the extended protocol is
universally better? I suspect that, as with many other things where
we support multiple alternatives, the best alternative depends on the
situation, and we should let users pick depending on their use case.

At any rate, if you're concerned about the relative efficiency of the
simple query protocol versus the extended protocol, it seems that the
horse has already left the barn. I just did a quick 32-client pgbench
-S test on a 32-core box. This is just a thirty-second run, but
that's enough to make the point: if you're not using prepared queries,
using the extended query protocol incurs a significant penalty - more
than 15% on this test:

[simple] tps = 246808.409932 (including connections establishing)
[extended] tps = 205609.438247 (including connections establishing)
[prepared] tps = 338150.881389 (including connections establishing)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-11-14 01:57:17 Re: why do we need two snapshots per query?
Previous Message Florian Pflug 2011-11-13 23:57:42 Re: FDW system columns