dynamic result sets support in extended query protocol

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: dynamic result sets support in extended query protocol
Date: 2020-10-08 07:46:38
Message-ID: 6e747f98-835f-2e05-cde5-86ee444a7140@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I want to progress work on stored procedures returning multiple result
sets. Examples of how this could work on the SQL side have previously
been shown [0]. We also have ongoing work to make psql show multiple
result sets [1]. This appears to work fine in the simple query
protocol. But the extended query protocol doesn't support multiple
result sets at the moment [2]. This would be desirable to be able to
use parameter binding, and also since one of the higher-level goals
would be to support the use case of stored procedures returning multiple
result sets via JDBC.

[0]:
https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us

(Terminology: I'm calling this project "dynamic result sets", which
includes several concepts: 1) multiple result sets, 2) those result sets
can have different structures, 3) the structure of the result sets is
decided at run time, not declared in the schema/procedure definition/etc.)

One possibility I rejected was to invent a third query protocol beside
the simple and extended one. This wouldn't really match with the
requirements of JDBC and similar APIs because the APIs for sending
queries don't indicate whether dynamic result sets are expected or
required, you only indicate that later by how you process the result
sets. So we really need to use the existing ways of sending off the
queries. Also, avoiding a third query protocol is probably desirable in
general to avoid extra code and APIs.

So here is my sketch on how this functionality could be woven into the
extended query protocol. I'll go through how the existing protocol
exchange works and then point out the additions that I have in mind.

These additions could be enabled by a _pq_ startup parameter sent by the
client. Alternatively, it might also work without that because the
client would just reject protocol messages it doesn't understand, but
that's probably less desirable behavior.

So here is how it goes:

C: Parse
S: ParseComplete

At this point, the server would know whether the statement it has parsed
can produce dynamic result sets. For a stored procedure, this would be
declared with the procedure definition, so when the CALL statement is
parsed, this can be noticed. I don't actually plan any other cases, but
for the sake of discussion, perhaps some variant of EXPLAIN could also
return multiple result sets, and that could also be detected from
parsing the EXPLAIN invocation.

At this point a client would usually do

C: Describe (statement)
S: ParameterDescription
S: RowDescription

New would be that the server would now also respond with a new message, say,

S: DynamicResultInfo

that indicates that dynamic result sets will follow later. The message
would otherwise be empty. (We could perhaps include the number of
result sets, but this might not actually be useful, and perhaps it's
better not to spent effort on counting things that don't need to be
counted.)

(If we don't guard this by a _pq_ startup parameter from the client, an
old client would now error out because of an unexpected protocol message.)

Now the normal bind and execute sequence follows:

C: Bind
S: BindComplete
(C: Describe (portal))
(S: RowDescription)
C: Execute
S: ... (DataRows)
S: CommandComplete

In the case of a CALL with output parameters, this "primary" result set
contains one row with the output parameters (existing behavior).

Now, if the client has seen DynamicResultInfo earlier, it should now go
into a new subsequence to get the remaining result sets, like this
(naming obviously to be refined):

C: NextResult
S: NextResultReady
C: Describe (portal)
S: RowDescription
C: Execute
....
S: CommandComplete
C: NextResult
...
C: NextResult
S: NoNextResult
C: Sync
S: ReadyForQuery

I think this would all have to use the unnamed portal, but perhaps there
could be other uses with named portals. Some details to be worked out.

One could perhaps also do without the DynamicResultInfo message and just
put extra information into the CommandComplete message indicating "there
are more result sets after this one".

(Following the model from the simple query protocol, CommandComplete
really means one result set complete, not the whole top-level command.
ReadyForQuery means the whole command is complete. This is perhaps
debatable, and interesting questions could also arise when considering
what should happen in the simple query protocol when a query string
consists of multiple commands each returning multiple result sets. But
it doesn't really seem sensible to cater to that.)

One thing that's missing in this sequence is a way to specify the
desired output format (text/binary) for each result set. This could be
added to the NextResult message, but at that point the client doesn't
yet know the number of columns in the result set, so we could only do it
globally. Then again, since the result sets are dynamic, it's less
likely that a client would be coded to set per-column output codes.
Then again, I would hate to bake such a restriction into the protocol,
because some is going to try. (I suspect what would be more useful in
practice is to designate output formats per data type.) So if we wanted
to have this fully featured, it might have to look something like this:

C: NextResult
S: NextResultReady
C: Describe (dynamic) (new message subkind)
S: RowDescription
C: Bind (zero parameters, optionally format codes)
S: BindComplete
C: Describe (portal)
S: RowDescription
C: Execute
...

While this looks more complicated, client libraries could reuse existing
code that starts processing with a Bind message and continues to
CommandComplete, and then just loops back around.

The mapping of this to libpq in a simple case could look like this:

PQsendQueryParams(conn, "CALL ...", ...);
PQgetResult(...); // gets output parameters
PQnextResult(...); // new: sends NextResult+Bind
PQgetResult(...); // and repeat

Again, it's not clear here how to declare the result column output
formats. Since libpq doesn't appear to expose the Bind message
separately, I'm not sure what to do here.

In JDBC, the NextResult message would correspond to the
Statement.getMoreResults() method. It will need a bit of conceptual
adjustment because the first result set sent on the protocol is actually
the output parameters, which the JDBC API returns separately from a
ResultSet, so the initial CallableStatement.execute() call will need to
process the primary result set and then send NextResult and obtain the
first dynamic result as the first ResultSet for its API, but that can be
handled internally.

Thoughts so far?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2020-10-08 08:11:48 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Dilip Kumar 2020-10-08 07:37:04 Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables