Re: PATCH: Batch/pipelining support for libpq

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "Prabakaran, Vaishnavi" <VaishnaviP(at)fast(dot)au(dot)fujitsu(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Dmitry Igrishin <dmitigr(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Manuel Kniep <m(dot)kniep(at)web(dot)de>, "fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "Iwata, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2017-09-13 05:33:02
Message-ID: CAMsr+YE2N5Am=iXPuLtNnyH_vgZSjfL40JMTSY3hnNEFZTDsaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 September 2017 at 13:06, Vaishnavi Prabakaran <
vaishnaviprabakaran(at)gmail(dot)com> wrote:

>
>
> On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>>
>>
>>
>> > Am failing to see the benefit in allowing user to set
>> > PQBatchAutoFlush(true|false) property? Is it really needed?
>>
>> I'm inclined not to introduce that for now. If somebody comes up with a
>> convincing usecase and numbers, we can add it later. Libpq API is set in
>> stone, so I'd rather not introduce unnecessary stuff...
>>
>>
> Thanks for reviewing the patch and yes ok.
>
>
>>
>>
>> > + <para>
>> > + Much like asynchronous query mode, there is no performance
>> disadvantage to
>> > + using batching and pipelining. It increases client application
>> complexity
>> > + and extra caution is required to prevent client/server deadlocks
>> but
>> > + can sometimes offer considerable performance improvements.
>> > + </para>
>>
>> That's not necessarily true, is it? Unless you count always doing
>> batches of exactly size 1.
>>
>
> Client application complexity is increased in batch mode,because
> application needs to remember the query queue status. Results processing
> can be done at anytime, so the application needs to know till what query,
> the results are consumed.
>
>

Yep. Also, the client/server deadlocks at issue here are a buffer
management issue, and deadlock is probably not exactly the right word. Your
app has to process replies from the server while it's sending queries,
otherwise it can get into a state where it has no room left in its send
buffer, but the server isn't consuming its receive buffer because the
server's send buffer is full. To allow the system to make progress, the
client must read from the client receive buffer.

This isn't an issue when using libpq normally.

PgJDBC has similar issues with its batch mode, but in PgJDBC it's much
worse because there's no non-blocking send available. In libpq you can at
least set your sending socket to non-blocking.

>
> > + <para>
>> > + Use batches when your application does lots of small
>> > + <literal>INSERT</literal>, <literal>UPDATE</literal> and
>> > + <literal>DELETE</literal> operations that can't easily be
>> transformed into
>> > + operations on sets or into a
>> > + <link linkend="libpq-copy"><literal>COPY</literal></link>
>> operation.
>> > + </para>
>>
>> Aren't SELECTs also a major beneficiarry of this?
>>
>
Yes, many individual SELECTs that cannot be assembled into a single more
efficient query would definitely also benefit.

> Hmm, though SELECTs also benefit from batch mode, doing multiple selects
> in batch mode will fill up the memory rapidly and might not be as
> beneficial as other operations listed.
>

Depends on the SELECT. With wide results you'll get less benefit, but even
then you can gain if you're on a high latency network. With "n+1" patterns
and similar, you'll see huge gains.

> Maybe note that multiple batches can be "in flight"?
>> I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
>> have a great idea, but we might want to rename...
>>
>>
> This function not only does error handling, but also sends the "Sync"
> message to backend. In batch mode, "Sync" message is not sent with every
> query but will
> be sent only via this function to mark the end of implicit transaction.
> Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any
> other better name.
>

I really do not like calling it "commit" as that conflates with a database
commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for
an earlier part of the batch to succeed and commit, then a later part to
fail, if that's the case. So that name is IMO wrong.

>>
>> > + <varlistentry id="libpq-PQbatchSyncQueue">
>> > + <term>
>> > + <function>PQbatchSyncQueue</function>
>> > + <indexterm>
>> > + <primary>PQbatchSyncQueue</primary>
>> > + </indexterm>
>> > + </term>
>>
>> I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
>> to mostly make sense from a protocol POV.
>>
>>
> Renamed to PQbatchCommitQueue.
>
>
Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?

>
>> > + * Put an idle connection in batch mode. Commands submitted after
>> this
>> > + * can be pipelined on the connection, there's no requirement to
>> wait for
>> > + * one to finish before the next is dispatched.
>> > + *
>> > + * Queuing of new query or syncing during COPY is not allowed.
>>
>> +"a"?
>>
>
> Hmm, Can you explain the question please. I don't understand.
>

s/of new query/of a new query/

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaishnavi Prabakaran 2017-09-13 05:44:00 Re: PATCH: Batch/pipelining support for libpq
Previous Message Vaishnavi Prabakaran 2017-09-13 05:06:50 Re: PATCH: Batch/pipelining support for libpq