Re: PATCH: Batch/pipelining support for libpq

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Matthieu Garrigues <matthieu(dot)garrigues(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2021-02-16 01:19:27
Message-ID: CAGRY4nzf20aWm_VkAOK2QYa-H7A4wbLQHzzt9VjzTwo=4CU5_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 11 Feb 2021 at 07:51, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> On 2021-Jan-21, Alvaro Herrera wrote:
>
> > As you can see in an XXX comment in the libpq test program, the current
> > implementation has the behavior that PQgetResult() returns NULL after a
> > batch is finished and has reported PGRES_BATCH_END. I don't know if
> > there's a hard reason to do that, but I'd like to supress it because it
> > seems weird and out of place.
>
> Hello Craig, a question for you since this API is your devising. I've
> been looking at changing the way this works to prevent those NULL
> returns from PQgetResult. That is, instead of having what seems like a
> "NULL separator" of query results, you'd just get the PGRES_BATCH_END to
> signify a batch end (not a NULL result after the BATCH_END); and the
> normal PGRES_COMMAND_OK or PGRES_TUPLES_OK etc when the result of a
> command has been sent. It's not working yet so I'm not sending an
> updated patch, but I wanted to know if you had a rationale for including
> this NULL return "separator" or was it just a convenience because of how
> the code grew together.
>

The existing API for libpq actually specifies[1] that you should call
PQgetResult() until it returns NULL:

> After successfully calling PQsendQuery, call PQgetResult one or more
times to obtain the results. PQsendQuery cannot be called again (on the
same connection) until PQgetResult has returned a null pointer, indicating
that the command is done.

Similarly, in single-row mode, the existing API specifies that you should
call PQgetResult() until it returns NULL.

Also, IIRC the protocol already permits multiple result sets to be
returned, and the caller cannot safely assume that a single PQsendQuery()
will produce only a single result set. (I really should write a test
extension that exercises that and how libpq reacts to it.)

That's why I went for the NULL return. I think. It's been a while now so
it's a bit fuzzy.

I would definitely like an API that does not rely on testing for a NULL
return. Especially since NULL return can be ambiguous in the context of
row-at-a-time mode. New explicit enumerations for PGresult would make a lot
more sense.

So +1 from me for the general idea. I need to look at the patch as it has
evolved soon too.

Remember that the original patch I submitted for this was a 1-day weekend
hack and proof of concept to show that libpq could be modified to support
query pipelining (and thus batching too), so I could illustrate the
performance benefits that could be attained by doing so. I'd been aware of
the benefits and the protocol's ability to support it for some time thanks
to working on PgJDBC, but couldn't get anyone interested without some C
code to demonstrate it, so I wrote some. I am not going to argue that the
API I added for it is ideal in any way, and happy to see improvements.

The only change I would very strongly object to would be anything that
turned this into a *batch* mode without query-pipelining support. If you
have to queue all the queries up in advance then send them as a batch and
wait for all the results, you miss out on a lot of the potential
round-trip-time optimisations and you add initial latency. So long as there
is a way to "send A", "send B", "send C", "read results from A", "send D",
and there's a way for the application to associate some kind of state (an
application specific id or index, a pointer to an application query-queue
struct, whatever) so it can match queries to results ... then I'm happy.

[1]
https://www.postgresql.org/docs/current/libpq-async.html#LIBPQ-PQSENDQUERY

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-16 01:20:45 Re: [DOC] add missing "[ NO ]" to various "DEPENDS ON" synopses
Previous Message Andres Freund 2021-02-16 01:12:51 Re: Snapbuild woes followup