Re: foreign table batch inserts

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Manuel Kniep <m(dot)kniep(at)web(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: foreign table batch inserts
Date: 2016-05-20 15:18:17
Message-ID: CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 May 2016 at 15:35, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

>
> You can, however, omit Sync from between messages and send a series of
> protocol messages, like
>
> Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync
>
> to avoid round-trip overheads.
>
>
I implemented what I think is a pretty solid proof of concept of this for
kicks this evening. Attached, including basic test program. Patch attached.
The performance difference over higher latency links is huge, see below.

Demo/test program in src/test/examples/testlibpqbatch.c.

github: https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

I still need to add the logic for handling an error during a batch by
discarding all input until the next Sync, but otherwise I think it's pretty
reasonable.

The time difference for 10k inserts on the local host over a unix socket
shows a solid improvement:

batch insert elapsed: 0.244293s
sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for
the purpose that lives about 320ms away the difference is huge:

batch insert elapsed: 9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a
bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already
saturated:

batch insert elapsed: 1.229024s
sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120

I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't
let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without
temp tables and a bunch of hoop jumping anyway. If COPY solved everything
there'd be no point having pipelining.

No docs yet, but if folks think the interface is reasonable I can add them
easily since the comments on each of the new functoins should be easy to
adapt into the SGML docs.

With a bit of polishing I think this can probably go in the next CF, though
I only wrote it as an experiment. Can I get opinions on the API?

The TL;DR API, using the usual async libpq routines, is:

PQbeginBatchMode(conn);

PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);

PQsendPrepare(conn, "my_update", "UPDATE ...");

PQsetnonblocking(conn, 1);

while (!all_responses_received)
{
select(...)

if (can-write)
{
if (app-has-more-data-to-send)
{
PQsendQueryPrepared(conn, "my_update", params-go-here);
}
else if (havent-sent-commit-yet)
{
PQsendQueryParams(conn, "COMMIT", ...);
}
else if (havent-sent-endbatch-yet)
{
PqEndBatch(conn);
}
PQflush(conn);
}

if (can-read)
{
PQconsumeInput(conn);
if (PQisBusy(conn))
continue;
res = PQgetResult(conn);
if (res == NULL)
{
PQgetNextQuery(conn);
continue;
}
/* process results in the same order we sent the commands */
/* client keeps track of that, libpq just supplies the results */
...
}
}

PQendBatch(conn);

Note that:

* PQsendQuery cannot be used as it uses simple query protocol, use
PQsendQueryParams instead;
* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared,
PQsendDescribePrepared, PQsendDescribePortal;
* You don't call PQgetResult after dispatching each query
* Multiple batches may be pipelined, you don't have to wait for one to end
to start another (an advantage over JDBC's API)
* non-blocking mode isn't required, but is strongly advised

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Draft-of-libpq-async-pipelining-support.patch text/x-patch 44.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildar Musin 2016-05-20 15:29:00 Re: Declarative partitioning
Previous Message Ronan Dunklau 2016-05-20 13:41:38 Possible regression regarding estimating relation width in FDWs