Re: PATCH: Batch/pipelining support for libpq

From: Andres Freund <andres(at)anarazel(dot)de>
To: Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "Prabakaran, Vaishnavi" <VaishnaviP(at)fast(dot)au(dot)fujitsu(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Dmitry Igrishin <dmitigr(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Manuel Kniep <m(dot)kniep(at)web(dot)de>, "fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "Iwata, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2017-08-23 09:40:53
Message-ID: 20170823094053.wug6junk35346ajz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
>
> Firstly, sorry for the delayed response as I got busy with other
> activities.

No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.

> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
>
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.

> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...

> + <para>
> + Much like asynchronous query mode, there is no performance disadvantage to
> + using batching and pipelining. It increases client application complexity
> + and extra caution is required to prevent client/server deadlocks but
> + can sometimes offer considerable performance improvements.
> + </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.

> + <para>
> + Batching is most useful when the server is distant, i.e. network latency
> + (<quote>ping time</quote>) is high, and when many small operations are being performed in
> + rapid sequence. There is usually less benefit in using batches when each
> + query takes many multiples of the client/server round-trip time to execute.
> + A 100-statement operation run on a server 300ms round-trip-time away would take
> + 30 seconds in network latency alone without batching; with batching it may spend
> + as little as 0.3s waiting for results from the server.
> + </para>

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.

> + <para>
> + Use batches when your application does lots of small
> + <literal>INSERT</literal>, <literal>UPDATE</literal> and
> + <literal>DELETE</literal> operations that can't easily be transformed into
> + operations on sets or into a
> + <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> + </para>

Aren't SELECTs also a major beneficiarry of this?

> + <para>
> + Batching is less useful when information from one operation is required by the
> + client before it knows enough to send the next operation.

s/less/not/

> + <note>
> + <para>
> + The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
> + use batches on server versions 8.4 and newer. Batching works on any server
> + that supports the v3 extended query protocol.
> + </para>
> + </note>

Where's the 8.4 coming from?

> + <para>
> + The client uses libpq's asynchronous query functions to dispatch work,
> + marking the end of each batch with <function>PQbatchSyncQueue</function>.
> + And to get results, it uses <function>PQgetResult</function> and
> + <function>PQbatchProcessQueue</function>. It may eventually exit
> + batch mode with <function>PQexitBatchMode</function> once all results are
> + processed.
> + </para>
> +
> + <note>
> + <para>
> + It is best to use batch mode with <application>libpq</application> in
> + <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> + blocking mode it is possible for a client/server deadlock to occur. The
> + client will block trying to send queries to the server, but the server will
> + block trying to send results from queries it has already processed to the
> + client. This only occurs when the client sends enough queries to fill its
> + output buffer and the server's receive buffer before switching to
> + processing input from the server, but it's hard to predict exactly when
> + that'll happen so it's best to always use non-blocking mode.
> + </para>
> + </note>

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory? We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.

> + <sect3 id="libpq-batch-sending">
> + <title>Issuing queries</title>
> +
> + <para>
> + After entering batch mode the application dispatches requests
> + using normal asynchronous <application>libpq</application> functions such as
> + <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,
> + <function>PQsendQueryPrepared</function>, <function>PQsendDescribePortal</function>,
> + <function>PQsendDescribePrepared</function>.
> + The asynchronous requests are followed by a <link
> + linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link> call to mark
> + the end of the batch. The client <emphasis>does not</emphasis> need to call
> + <function>PQgetResult</function> immediately after dispatching each
> + operation. <link linkend="libpq-batch-results">Result processing</link>
> + is handled separately.
> + </para>
> +
> + <para>
> + Batched operations will be executed by the server in the order the client
> + sends them. The server will send the results in the order the statements
> + executed. The server may begin executing the batch before all commands
> + in the batch are queued and the end of batch command is sent. If any
> + statement encounters an error the server aborts the current transaction and
> + skips processing the rest of the batch. Query processing resumes after the
> + end of the failed batch.
> + </para>

Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...

> + <note>
> + <para>
> + The client must not assume that work is committed when it
> + <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> + corresponding result is received to confirm the commit is complete.
> + Because errors arrive asynchronously the application needs to be able to
> + restart from the last <emphasis>received</emphasis> committed change and
> + resend work done after that point if something goes wrong.
> + </para>
> + </note>

This seems fairly independent of batching.

> + </sect3>
> +
> + <sect3 id="libpq-batch-interleave">
> + <title>Interleaving result processing and query dispatch</title>
> +
> + <para>
> + To avoid deadlocks on large batches the client should be structured around
> + a nonblocking I/O loop using a function like <function>select</function>,
> + <function>poll</function>, <function>epoll</function>,
> + <function>WaitForMultipleObjectEx</function>, etc.
> + </para>
> +
> + <para>
> + The client application should generally maintain a queue of work still to
> + be dispatched and a queue of work that has been dispatched but not yet had
> + its results processed.

Hm. Why? If queries are just issued, no such queue is required?

> When the socket is writable it should dispatch more
> + work. When the socket is readable it should read results and process them,
> + matching them up to the next entry in its expected results queue. Batches
> + should be scoped to logical units of work, usually (but not always) one
> + transaction per batch. There's no need to exit batch mode and re-enter it
> + between batches or to wait for one batch to finish before sending the next.
> + </para>

This really needs to take memory usage into account.

> + </variablelist>
> +
> + </listitem>
> + </varlistentry>
> +
> + <varlistentry id="libpq-PQenterBatchMode">
> + <term>
> + <function>PQenterBatchMode</function>
> + <indexterm>
> + <primary>PQenterBatchMode</primary>
> + </indexterm>
> + </term>
> +
> + <listitem>
> + <para>
> + Causes a connection to enter batch mode if it is currently idle or
> + already in batch mode.
> +
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +
> + </para>
> + <para>
> + Returns 1 for success. Returns 0 and has no
> + effect if the connection is not currently idle, i.e. it has a result
> + ready, is waiting for more input from the server, etc. This function
> + does not actually send anything to the server, it just changes the
> + <application>libpq</application> connection state.
> +
> + </para>
> + </listitem>
> + </varlistentry>
> +
> + <varlistentry id="libpq-PQexitBatchMode">
> + <term>
> + <function>PQexitBatchMode</function>
> + <indexterm>
> + <primary>PQexitBatchMode</primary>
> + </indexterm>
> + </term>
> +
> + <listitem>
> + <para>
> + Causes a connection to exit batch mode if it is currently in batch mode
> + with an empty queue and no pending results.
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> + </para>
> + <para>Returns 1 for success.
> + Returns 1 and takes no action if not in batch mode. If the connection has

"returns 1"?

> + <varlistentry id="libpq-PQbatchSyncQueue">
> + <term>
> + <function>PQbatchSyncQueue</function>
> + <indexterm>
> + <primary>PQbatchSyncQueue</primary>
> + </indexterm>
> + </term>
> +
> + <listitem>
> + <para>
> + Delimits the end of a set of a batched commands by sending a <link
> + linkend="protocol-flow-ext-query">sync message</link> and flushing
> + the send buffer. The end of a batch serves as
> + the delimiter of an implicit transaction and
> + an error recovery point; see <link linkend="libpq-batch-errors">
> + error handling</link>.

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.

> + <varlistentry id="libpq-PQbatchQueueCount">
> + <term>
> + <function>PQbatchQueueCount</function>
> + <indexterm>
> + <primary>PQbatchQueueCount</primary>
> + </indexterm>
> + </term>
> +
> + <listitem>
> + <para>
> + Returns the number of queries still in the queue for this batch, not
> + including any query that's currently having results being processed.
> + This is the number of times <function>PQbatchProcessQueue</function> has to be
> + called before the query queue is empty again.
> +
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +
> + </para>
> + </listitem>
> + </varlistentry>

Given that apps are supposed to track this, I'm not sure why we have
this?

> +/*
> + * PQrecyclePipelinedCommand
> + * Push a command queue entry onto the freelist. It must be a dangling entry
> + * with null next pointer and not referenced by any other entry's next pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> + if (entry == NULL)
> + return;
> + if (entry->next != NULL)
> + {
> + fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry"));
> + abort();

Don't think we use abort() in libpq like that. There's some Assert()s
tho.

> static bool
> PQsendQueryStart(PGconn *conn)
> @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn)
> libpq_gettext("no connection to the server\n"));
> return false;
> }
> - /* Can't send while already busy, either. */
> - if (conn->asyncStatus != PGASYNC_IDLE)
> + /* Can't send while already busy, either, unless enqueuing for later */
> + if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == PQBATCH_MODE_OFF)
> {
> printfPQExpBuffer(&conn->errorMessage,
> libpq_gettext("another command is already in progress\n"));
> return false;
> }
>
> - /* initialize async result-accumulation state */
> - pqClearAsyncResult(conn);
> + if (conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + /*

Weirdly indented.

> + if (conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + pipeCmd = PQmakePipelinedCommand(conn);
> +
> + if (pipeCmd == NULL)
> + return 0; /* error msg already set */
> +
> + last_query = &pipeCmd->query;
> + queryclass = &pipeCmd->queryclass;
> + }
> + else
> + {
> + last_query = &conn->last_query;
> + queryclass = &conn->queryclass;
> + }

The amount of complexity / branches we're adding to all of these is more
than a bit unsightly.

> +/*
> + * PQbatchQueueCount
> + * Return number of queries currently pending in batch mode
> + */
> +int
> +PQbatchQueueCount(PGconn *conn)
> +{
> + int count = 0;
> + PGcommandQueueEntry *entry;
> +
> + if (PQbatchStatus(conn) == PQBATCH_MODE_OFF)
> + return 0;
> +
> + entry = conn->cmd_queue_head;
> + while (entry != NULL)
> + {
> + ++count;
> + entry = entry->next;
> + }
> + return count;
> +}

Ugh, O(N)? In that case I'd rather just remove this.

> +/*
> + * PQbatchBegin

Mismatched w/ actual function name.

> + * Put an idle connection in batch mode. Commands submitted after this
> + * can be pipelined on the connection, there's no requirement to wait for
> + * one to finish before the next is dispatched.
> + *
> + * Queuing of new query or syncing during COPY is not allowed.

+"a"?

> + * A set of commands is terminated by a PQbatchQueueSync. Multiple sets of batched
> + * commands may be sent while in batch mode. Batch mode can be exited by
> + * calling PQbatchEnd() once all results are processed.
> + *
> + * This doesn't actually send anything on the wire, it just puts libpq
> + * into a state where it can pipeline work.
> + */
> +int
> +PQenterBatchMode(PGconn *conn)
> +{
> + if (!conn)
> + return false;

true/false isn't quite in line with int return code.

> +/*
> + * PQbatchEnd

wrong name.

> + * End batch mode and return to normal command mode.
> + *
> + * Has no effect unless the client has processed all results
> + * from all outstanding batches and the connection is idle.

That seems wrong - will lead to hard to diagnose errors.

> + * Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> + if (!conn)
> + return false;
> +
> + if (conn->batch_status == PQBATCH_MODE_OFF)
> + return true;
> +
> + switch (conn->asyncStatus)
> + {
> + case PGASYNC_IDLE:
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext_noop("internal error, IDLE in batch mode"));
> + break;
> + case PGASYNC_COPY_IN:
> + case PGASYNC_COPY_OUT:
> + case PGASYNC_COPY_BOTH:
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext_noop("internal error, COPY in batch mode"));
> + break;

So we'll still check the queue in this case, that's a bit weird?

> +/*
> + * PQbatchQueueSync

Out of sync.

> + * End a batch submission by sending a protocol sync. The connection will
> + * remain in batch mode and unavailable for new non-batch commands until all
> + * results from the batch are processed by the client.

"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).

> + * It's legal to start submitting another batch immediately, without waiting
> + * for the results of the current batch. There's no need to end batch mode
> + * and start it again.
> + *
> + * If a command in a batch fails, every subsequent command up to and including
> + * the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the
> + * whole batch is processed without error, a PGresult with PGRES_BATCH_END is
> + * produced.

Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?

> +/*
> + * PQbatchQueueProcess

Out of sync.

> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.

Last complaint about this - think this forgiving mode is a mistake.

> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{

> + /* This command's results will come in immediately.
> + * Initialize async result-accumulation state */
> + pqClearAsyncResult(conn);

I'm not following?

> /*
> * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)

> + if (conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + /*
> + * batched queries aren't followed by a Sync to put us back in
> + * PGASYNC_IDLE state, and when we do get a sync we could
> + * still have another batch coming after this one.

This needs rephrasing.

> + * The connection isn't idle since we can't submit new
> + * nonbatched commands. It isn't also busy since the current
> + * command is done and we need to process a new one.
> + */
> + conn->asyncStatus = PGASYNC_QUEUED;

Not sure I like the name.

> + if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext("Synchronous command execution functions are not allowed in batch mode\n"));
> + return false;
> + }

Why do we need the PGASYNC_QUEUED test here?

> +/* pqBatchFlush
> + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> + * In non-batch mode, data will be flushed all the time.
> + */
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> + return(pqFlush(conn));
> + return 0; /* Just to keep compiler quiet */
> +}

This should be defined in a macro or such, rather than hardcoded.

Falling over now. This seems like enough feedback for a bit of work
anyway.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2017-08-23 11:13:04 Re: Partition-wise aggregation/grouping
Previous Message Simon Riggs 2017-08-23 09:12:11 Re: Explicit relation name in VACUUM VERBOSE log