Re: PATCH: Batch/pipelining support for libpq

From: Matthieu Garrigues <matthieu(dot)garrigues(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Dave Cramer <davecramer(at)postgres(dot)rocks>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2020-11-12 09:46:54
Message-ID: CAJkzx4S6YTKAJrasb4hSaM8tNHNrgUk-bxB4Da7sT++D9Zq7cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

Thanks for the feedback. I did rework a bit the doc based on your
remarks. Here is the v24 patch.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 6:21 PM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>
>> On 2020-Nov-02, Alvaro Herrera wrote:
>>
>> > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > missing the new values. Added those. No significant other changes yet.
>>
>
> Just reading the documentation of this patch, haven't been following the longer thread:
>
> Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to how synchronous functions are disallowed?
>
> "Batched operations will be executed by the server in the order the client
> sends them. The server will send the results in the order the statements
> executed."
>
> Maybe:
>
> "The server executes statements, and returns results, in the order the client sends them."
>
> Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to add unnecessary cognitive load.
>
> + The client <link linkend="libpq-batch-interleave">interleaves result
> + processing</link> with sending batch queries, or for small batches may
> + process all results after sending the whole batch.
>
> Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the complete batch has been sent."
>
> I would expect to process the results of a batch only after sending the entire batch to the server. That I don't have to is informative but knowing when I should avoid doing so, and why, is informative as well. To the extreme while you can use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing pointless. Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems worth considering. The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or is it just process memory on the server) if interleaving it not performed and sizes are large.
>
> I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored while previous completed transactions are retained" in the "When to Use Batching". Something like "Batching is less useful, and more complex, when a single batch contains multiple transactions (see Error Handling)."
>
> My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction, end the batch, check for batch failure and if it doesn't fail have the option to easily continue without processing individual pgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the rest, knowing that the transaction as a whole was reverted and the batch unapplied). I've never interfaced with libpq directly. Though given how the existing C API works what is implemented here seems consistent.
>
> The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side behavior where nothing is sent to the server until the batch has been completed. Reading further it becomes clear that all it basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while prior commands have their results waiting to be ingested by the client.
>
> Batch seems like the user-visible term to describe this feature. Pipeline seems like an implementation detail that doesn't need to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the first two paragraphs of the chapter and never without being linked directly to "batch". I would probably leave the indexterm and have a paragraph describing that batching is implemented using a query pipeline so that people with the implementation detail on their mind can find this chapter, but the prose for the user should just stick to batching.
>
> Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some more words spent on what trade-offs are being made when using batching versus normal command-response processing. That said, while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the documentation at the moment. Hopefully my perspective helps though, and depending on what happens next I may try and make my thoughts more concrete with an actual patch.
>
> David J.
>

Attachment Content-Type Size
v24-libpq-batch.patch text/x-patch 86.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2020-11-12 10:16:42 Re: Asynchronous Append on postgres_fdw nodes.
Previous Message Dilip Kumar 2020-11-12 09:40:23 Re: logical streaming of xacts via test_decoding is broken