Re: dynamic result sets support in extended query protocol

From: Dave Cramer <davecramer(at)postgres(dot)rocks>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic result sets support in extended query protocol
Date: 2020-10-09 18:39:38
Message-ID: CADK3HHJS_PgjYAk+cCN4pZm69qWz82jLtPnn8E3wTgqbY-coLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 9 Oct 2020 at 13:33, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 10/8/20 3:46 AM, Peter Eisentraut wrote:
> > I want to progress work on stored procedures returning multiple result
> > sets. Examples of how this could work on the SQL side have previously
> > been shown [0]. We also have ongoing work to make psql show multiple
> > result sets [1]. This appears to work fine in the simple query
> > protocol. But the extended query protocol doesn't support multiple
> > result sets at the moment [2]. This would be desirable to be able to
> > use parameter binding, and also since one of the higher-level goals
> > would be to support the use case of stored procedures returning
> > multiple result sets via JDBC.
> >
> > [0]:
> >
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> > [1]: https://commitfest.postgresql.org/29/2096/
> > [2]:
> > https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
> >
> > (Terminology: I'm calling this project "dynamic result sets", which
> > includes several concepts: 1) multiple result sets, 2) those result
> > sets can have different structures, 3) the structure of the result
> > sets is decided at run time, not declared in the schema/procedure
> > definition/etc.)
> >
> > One possibility I rejected was to invent a third query protocol beside
> > the simple and extended one. This wouldn't really match with the
> > requirements of JDBC and similar APIs because the APIs for sending
> > queries don't indicate whether dynamic result sets are expected or
> > required, you only indicate that later by how you process the result
> > sets. So we really need to use the existing ways of sending off the
> > queries. Also, avoiding a third query protocol is probably desirable
> > in general to avoid extra code and APIs.
> >
> > So here is my sketch on how this functionality could be woven into the
> > extended query protocol. I'll go through how the existing protocol
> > exchange works and then point out the additions that I have in mind.
> >
> > These additions could be enabled by a _pq_ startup parameter sent by
> > the client. Alternatively, it might also work without that because
> > the client would just reject protocol messages it doesn't understand,
> > but that's probably less desirable behavior.
> >
> > So here is how it goes:
> >
> > C: Parse
> > S: ParseComplete
> >
> > At this point, the server would know whether the statement it has
> > parsed can produce dynamic result sets. For a stored procedure, this
> > would be declared with the procedure definition, so when the CALL
> > statement is parsed, this can be noticed. I don't actually plan any
> > other cases, but for the sake of discussion, perhaps some variant of
> > EXPLAIN could also return multiple result sets, and that could also be
> > detected from parsing the EXPLAIN invocation.
> >
> > At this point a client would usually do
> >
> > C: Describe (statement)
> > S: ParameterDescription
> > S: RowDescription
> >
> > New would be that the server would now also respond with a new
> > message, say,
> >
> > S: DynamicResultInfo
> >
> > that indicates that dynamic result sets will follow later. The
> > message would otherwise be empty. (We could perhaps include the
> > number of result sets, but this might not actually be useful, and
> > perhaps it's better not to spent effort on counting things that don't
> > need to be counted.)
> >
> > (If we don't guard this by a _pq_ startup parameter from the client,
> > an old client would now error out because of an unexpected protocol
> > message.)
> >
> > Now the normal bind and execute sequence follows:
> >
> > C: Bind
> > S: BindComplete
> > (C: Describe (portal))
> > (S: RowDescription)
> > C: Execute
> > S: ... (DataRows)
> > S: CommandComplete
> >
> > In the case of a CALL with output parameters, this "primary" result
> > set contains one row with the output parameters (existing behavior).
> >
> > Now, if the client has seen DynamicResultInfo earlier, it should now
> > go into a new subsequence to get the remaining result sets, like this
> > (naming obviously to be refined):
> >
> > C: NextResult
> > S: NextResultReady
> > C: Describe (portal)
> > S: RowDescription
> > C: Execute
> > ....
> > S: CommandComplete
> > C: NextResult
> > ...
> > C: NextResult
> > S: NoNextResult
> > C: Sync
> > S: ReadyForQuery
> >
> > I think this would all have to use the unnamed portal, but perhaps
> > there could be other uses with named portals. Some details to be
> > worked out.
> >
> > One could perhaps also do without the DynamicResultInfo message and
> > just put extra information into the CommandComplete message indicating
> > "there are more result sets after this one".
> >
> > (Following the model from the simple query protocol, CommandComplete
> > really means one result set complete, not the whole top-level command.
> > ReadyForQuery means the whole command is complete. This is perhaps
> > debatable, and interesting questions could also arise when considering
> > what should happen in the simple query protocol when a query string
> > consists of multiple commands each returning multiple result sets.
> > But it doesn't really seem sensible to cater to that.)
> >
> > One thing that's missing in this sequence is a way to specify the
> > desired output format (text/binary) for each result set. This could
> > be added to the NextResult message, but at that point the client
> > doesn't yet know the number of columns in the result set, so we could
> > only do it globally. Then again, since the result sets are dynamic,
> > it's less likely that a client would be coded to set per-column output
> > codes. Then again, I would hate to bake such a restriction into the
> > protocol, because some is going to try. (I suspect what would be more
> > useful in practice is to designate output formats per data type.) So
> > if we wanted to have this fully featured, it might have to look
> > something like this:
> >
> > C: NextResult
> > S: NextResultReady
> > C: Describe (dynamic) (new message subkind)
> > S: RowDescription
> > C: Bind (zero parameters, optionally format codes)
> > S: BindComplete
> > C: Describe (portal)
> > S: RowDescription
> > C: Execute
> > ...
> >
> > While this looks more complicated, client libraries could reuse
> > existing code that starts processing with a Bind message and continues
> > to CommandComplete, and then just loops back around.
> >
> > The mapping of this to libpq in a simple case could look like this:
> >
> > PQsendQueryParams(conn, "CALL ...", ...);
> > PQgetResult(...); // gets output parameters
> > PQnextResult(...); // new: sends NextResult+Bind
> > PQgetResult(...); // and repeat
> >
> > Again, it's not clear here how to declare the result column output
> > formats. Since libpq doesn't appear to expose the Bind message
> > separately, I'm not sure what to do here.
> >
> > In JDBC, the NextResult message would correspond to the
> > Statement.getMoreResults() method. It will need a bit of conceptual
> > adjustment because the first result set sent on the protocol is
> > actually the output parameters, which the JDBC API returns separately
> > from a ResultSet, so the initial CallableStatement.execute() call will
> > need to process the primary result set and then send NextResult and
> > obtain the first dynamic result as the first ResultSet for its API,
> > but that can be handled internally.
> >
> > Thoughts so far?
> >
>
>
> Exciting stuff. But I'm a bit concerned about the sequence of
> resultsets. The JDBC docco for CallableStatement says:
>
> A CallableStatement can return one ResultSet object or multiple
> ResultSet objects. Multiple ResultSet objects are handled using
> operations inherited from Statement.
>
> For maximum portability, a call's ResultSet objects and update
> counts should be processed prior to getting the values of output
> parameters.
>
> And this is more or less in line with the pattern that I've seen when
> converting SPs from other systems - the OUT params are usually set at
> the end with things like status flags and error messages.
>
> If the OUT parameter resultset has to come first (which is how I read
> your proposal - please correct me if I'm wrong) we'll have to stack up
> all the resultsets until the SP returns, then send the OUT params, then
> send the remaining resultsets. That seems ... suboptimal. The
> alternative would be to send the OUT params last. That might result in
> the driver needing to do some lookahead and caching, but I don't think
> it's unmanageable. Of course, your protocol would also need changing.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
Currently the JDBC driver does NOT do :

At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription

We do not do the Describe until we use a named statement and decide that
the extra round trip is worth it.

Making this assumption will cause a performance regression on all queries.

If we are going to make a protocol change there are a number of other
things the drivers want.
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md

Thanks,

Dave

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-10-09 18:46:25 Re: dynamic result sets support in extended query protocol
Previous Message Andrey Borodin 2020-10-09 18:08:42 Re: Batching page logging during B-tree build