More protocol discussion: breaking down query processing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: More protocol discussion: breaking down query processing
Date: 2003-04-02 23:15:58
Message-ID: 24009.1049325358@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces

Last month's discussions about a revised FE/BE protocol make it clear that
there is usefulness in breaking down the query-processing cycle and making
individual steps available at the protocol level. This surfaced in
several forms in the discussion, including requests for protocol-level
PREPARE functionality, concerns about the inefficiency of fetching a
RowDescription with each row pulled from a cursor, etc. Here are some
thoughts about how to do this.

There are two intermediate objects involved in processing a query: a
parsed query and a cursor (perhaps we can find better names for them).
The parsed query contains all the info needed to execute a query, except
for actual values of any parameter placeholders used in the query.
A cursor is a parsed query instantiated with actual parameter values;
it's ready to execute and return rows (if the query returns any --- a
"cursor" for an UPDATE, say, wouldn't return anything).

Query processing can then be broken down into several steps. The PARSE
step takes a query text string, and optionally some indication of the
types of the parameters needed, and produces a parsed query object (this
is exactly equivalent to the existing PREPARE command's functionality).
Parameter types are induced from the query if not specified. The BIND
step takes a parsed-query object and some actual parameter values and
produces a ready-to-execute cursor. The FETCH step fetches some or all
rows from a cursor. One may CLOSE a cursor when done with it, and may
DEALLOCATE a parsed query object when it's no longer needed. We will also
want a DESCRIBE operation that can be applied to either cursors or parsed
queries (these perhaps should be thought of as two separate operations).
Describing a cursor yields info about the output columns (equivalent to
RowDescription message in the present protocol). Describing a parsed
query yields RowDescription plus some description of the parameter(s).

The existing Q"sql statement" command can now be thought of as a macro
that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from
cursor, CLOSE, DEALLOCATE.

We will still offer that all-in-one functionality, but we'll also offer an
extended query command that allows any combination of these functions to
be requested at the protocol level.

These protocol operations are interoperable with the SQL-level commands
for prepared queries and cursors, in the sense that a prepared query or
open cursor can be created either way and then used either way.
(Alternatively we might think that this is a bad idea, and that
protocol-level operations should use a different namespace from SQL
commands, so that application-requested operations can't interfere with
state that the client-side library has set up. Any opinions?)

The BIND operation will allow actual parameter values to be sent in either
text or binary form; in either case we'll use a bytecount-followed-by-data
format so that it's fully eight-bit-clean.

I am inclined to think also that the FETCH operation should allow
specification of whether it wants the results in text or binary form.
It's not real clear how that should interoperate with the existing
DECLARE BINARY CURSOR command, though --- which one should win if there's
a conflict?

All of these functions can specify the relevant parsed-query and cursor
objects by name. A named parsed-query object lives until end of session
unless deallocated. A named cursor object lives until end of transaction
unless closed. (This might be extended to allow holdable cursors, though
I'm more inclined to say that such things can only be created by SQL
commands.) Alternatively, the protocol functions can reference nameless
(zero-length-name) parsed-query and cursor objects. There can be only one
of each of these, living only till the next one is created. The system
can optimize nameless objects a little since it knows it need not save
them beyond the end of the current query. The all-in-one query command
always uses nameless objects.

A difficulty with this scheme is that it doesn't scale very well to
querystrings containing more than one SQL statement. For the all-in-one
query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle
is automatically repeated for each parsetree constructed from the string;
this is backwards-compatible with what happens now. It seems fairly messy
to extend that to the case where the steps are individually commanded ...
especially if you want to assume that the steps are being commanded by a
client library that hasn't parsed the querystring and doesn't know how
many commands exist therein. Perhaps we can get away with disallowing
multiple commands per string when using the extended protocol.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Pflug 2003-04-02 23:20:01 Re: 7.4devel auth failed
Previous Message Dann Corbit 2003-04-02 23:11:54 Re: contrib and licensing

Browse pgsql-interfaces by date

  From Date Subject
Next Message Bruce Badger 2003-04-03 00:06:00 FE/BE - Side effects
Previous Message Manfred Koizar 2003-04-02 12:51:57 Re: Oracle Porting, Compiere