Re: PATCH: Batch/pipelining support for libpq

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Matthieu Garrigues <matthieu(dot)garrigues(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Dave Cramer <davecramer(at)postgres(dot)rocks>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2020-11-03 17:20:53
Message-ID: CAKFQuwZTNgNwnHdxQnQnOpKfyX5KW6twgj23ZNUZByDVgmdi7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> On 2020-Nov-02, Alvaro Herrera wrote:
>
> > In v23 I've gone over docs; discovered that PQgetResults docs were
> > missing the new values. Added those. No significant other changes yet.
>
>
Just reading the documentation of this patch, haven't been following the
longer thread:

Given the caveats around blocking mode connections why not just require
non-blocking mode, in a similar fashion to how synchronous functions are
disallowed?

"Batched operations will be executed by the server in the order the client
sends them. The server will send the results in the order the statements
executed."

Maybe:

"The server executes statements, and returns results, in the order the
client sends them."

Using two sentences and relying on the user to mentally link the two "in
the order" descriptions together seems to add unnecessary cognitive load.

+ The client <link linkend="libpq-batch-interleave">interleaves result
+ processing</link> with sending batch queries, or for small batches may
+ process all results after sending the whole batch.

Suggest: "The client may choose to interleave result processing with
sending batch queries, or wait until the complete batch has been sent."

I would expect to process the results of a batch only after sending the
entire batch to the server. That I don't have to is informative but
knowing when I should avoid doing so, and why, is informative as well. To
the extreme while you can use batch mode and interleave if you just poll
getResult after every command you will make the whole batch thing
pointless. Directing the reader from here to the section "Interleaving
Result Processing and Query Dispatch" seems worth considering. The
dynamics of small sizes and sockets remains a bit unclear as to what will
break (if anything, or is it just process memory on the server) if
interleaving it not performed and sizes are large.

I would suggest placing commentary about "all transactions subsequent to a
failed transaction in a batch are ignored while previous completed
transactions are retained" in the "When to Use Batching". Something like
"Batching is less useful, and more complex, when a single batch contains
multiple transactions (see Error Handling)."

My imagined use case would be to open a batch, start a transaction, send
all of its components, end the transaction, end the batch, check for batch
failure and if it doesn't fail have the option to easily continue without
processing individual pgResults (or if it does fail, have the option to
extract the first error pgResult and continue, ignoring the rest, knowing
that the transaction as a whole was reverted and the batch unapplied).
I've never interfaced with libpq directly. Though given how the existing C
API works what is implemented here seems consistent.

The "queueing up queries into a pipeline to be executed as a batch on the
server" can be read as a client-side behavior where nothing is sent to the
server until the batch has been completed. Reading further it becomes
clear that all it basically is is a sever-side toggle that instructs the
server to continue processing incoming commands even while prior commands
have their results waiting to be ingested by the client.

Batch seems like the user-visible term to describe this feature. Pipeline
seems like an implementation detail that doesn't need to be mentioned in
the documentation - especially given that pipeline doesn't get a mentioned
beyond the first two paragraphs of the chapter and never without being
linked directly to "batch". I would probably leave the indexterm and have
a paragraph describing that batching is implemented using a query pipeline
so that people with the implementation detail on their mind can find this
chapter, but the prose for the user should just stick to batching.

Sorry, that all is a bit unfocused, but the documentation for the user of
the API could be cleaned up a bit and some more words spent on
what trade-offs are being made when using batching versus normal
command-response processing. That said, while I don't see all of this
purely a matter of style I'm also not seeing anything demonstrably wrong
with the documentation at the moment. Hopefully my perspective helps
though, and depending on what happens next I may try and make my thoughts
more concrete with an actual patch.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-11-03 17:27:34 Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL
Previous Message Nikhil Benesch 2020-11-03 16:53:55 Why does to_json take "anyelement" rather than "any"?