Re: foreign table batch inserts

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Manuel Kniep <m(dot)kniep(at)web(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: foreign table batch inserts
Date: 2016-05-18 16:27:33
Message-ID: CAMsr+YHE8Rt800yWcHEL8SrgruK0ng_nBmtKV6YMZ2BAzRBZzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 May 2016 at 06:08, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:

> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>
> Using a single query string with multiple values, perhaps, but after
> that comes into consideration query string limit particularly for
> large text values... The query used for the insertion is a prepared
> statement since writable queries are supported in 9.3, which makes the
> code quite simple actually.
>

This should be done how PgJDBC does batches. It'd require a libpq
enhancement, but it's one we IMO need anyway: allow pipelined query
execution from libpq.

[design follows]

What this should be doing is:

- send Parse to create an unnamed prepared statement; then
- loop, and:
- send a Bind & an Execute for the query with values if the send buffer
is not full
- If there are no more values to send, send a Sync message
- Receive and process results if the receive buffer is not empty
- Check each result and mark it off against the list of dispatched queries
- If an ERROR is received, bail out
- If a Sync is received, check that all results have been retrieved as
expected then return OK

This would require libpq to be smarter about how it tracks queries. Right
now it keeps track of current query, query results, etc directly in the
connection object, and it sends a Sync after each operation then expects to
wait in a busy state until it gets the results from that operation.

Instead we'd have to have a FIFO queue of messages libpq expects responses
for. Variants of
PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, etc would not
send a Sync message and would append an entry to the expected result queue
instead of setting the current query, etc on the connection. They'd still
mark the connection as busy, so no non-queue-aware calls could be run until
the queue is consumed and empty.

These functions might return some kind of handle value that can be used to
identify the queue entry they created; it'd be pretty useless at the
moment, but needed if we ever get "cancel queries up to X" functionality on
the protocol or if we later added buffering of multiple query results.

A new PQsendSync or similar would be added to send a synchronisation point,
which would go into the FIFO. Clients would call that after enqueueing a
batch of work, e.g. after sending a commit for a batched xact. That's
required for error recovery.

Clients would use PQgetResults as before. When it returns null, they'd call
a new PQnextResult(...) function to initiate processing of the next
operation's input; this would pop the next operaiton from the FIFO, or
return null if there's nothing more in the queue. PQisBusy returns true
until there are no items left in the queue.

We'd still use the connection object for result sets, fetching rows, etc,
as there can still only be one "current" query for which a response is
being received from the server. Nothing much would change with PQgetResult
etc. There wouldn't be any PQgetResult variant to wait for results of the
nth query or for some kind of query handle, because we need the client to
consume the results of all prior queries. The client must process query
results in FIFO order. We could have per-query result buffers, etc, but it
seems pretty pointless; the client can do this for its self if it wants.

If the server sends an error, libpq would pop popping queries off the queue
until we get to the Sync there and consume input on the socketuntil we get
to a Sync on the wire. PQgetResult for each queued operation so skipped
would return a state indicating that it didn't execute because of an error
in a prior operation.

Such an API would benefit immensely from the "cancel up to" functionality
we discussed here recently; without it, it's hard to cancel anything
reliably and know what exactly you're cancelling, but it doesn't need it.
The cancel problem isn't much worse than before.

If we wanted to allow batch execution from the sync API we'd need a new
function that takes a prepared query and an array of values and manages the
send and receive buffer polling using the async API internally, since we
need to use nonblocking sockets to avoid deadlocking.

I don't think this would look that different to current libpq code to the
user. Ignoring the details about error handling on command dispatch, etc.
The app would just call a series of PQqueuePrepare, PQqueueQueryPrepared,
etc (bikeshed as desired) then PQsendSync(...). Then it'd call PQgetResults
until it returns null, call PQgetNextResult(...) and resume calling
PQgetResults(...). Repeat until PQgetNextResult(...) returns null, and
check that the most recent result was a PGRES_SYNC_OK, which is what we'll
return from processing a PQsendSync(...) result.

If the client wants to be totally nonblocking it can do the PQconsumeInput
and PQflush dance as normal. It's strongly preferable for the client to use
non-blocking writes, because if it doesn't then it risks creating a
deadlock where the server and client are both blocked on writes. The client
is trying to write to its send buffer, but the server will never consume it
because the server's blocked writing results to its own send buffer, which
the client won't consume because it's blocked. It's still safe to pipeline
writes in blocking mode if you know you'll never write anything close to
the send buffer before you send a sync and switch to reading results,
though.

If we had this in libpq, FDWs could just prepare an insert then send the
data values in an efficient, pipelined manner. It's not quite as fast as
COPY, but it's a whole lot faster than the current round-trip-heavy
approach, and unlike COPY it can be used for update/delete too.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-05-18 17:16:03 Re: explain analyze does not report actual rows correctly?
Previous Message chang chao 2016-05-18 16:01:59 explain analyze does not report actual rows correctly?