Re: POC: postgres_fdw insert batching

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: tsunakawa(dot)takay(at)fujitsu(dot)com
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-11-30 02:34:00
Message-ID: CAGRY4nzbrMK19FQiP_DoNQbCD9rOGhkOqXTcjY7-d1A1FyHGGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 27 Nov 2020, 14:06 tsunakawa(dot)takay(at)fujitsu(dot)com,
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
>
Also, I'm afraid it requires major surgery or reform of executor. I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)

Totally sensible. If it isn't feasible without significant executor
change that's all that needs to be said.

I was afraid that'd be the case given the executor's pull flow but
just didn't know enough.

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

I know Andres did some initial work on executor parallelism and
pipelining. I should take a look.

> > But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...
>
> Wow, impressive number. I've just seen it in the beginning of the libpq pipelining thread (oh, already four years ago..!)

Yikes.

> Could you share the workload and the network latency (ping time)? I'm sorry I'm just overlooking it.

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

> Thank you for your (always) concise explanation.

You joke! I am many things but despite my best efforts concise is
rarely one of them.

> I'd like to check other DBMSs and your rich references for the FDW interface. (My first intuition is that many major DBMSs might not have client C APIs that can be used to implement an async pipelining FDW interface.

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3] and
C++ client [4]; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
streams [2]. This is relevant for any REST-like API.

> (It'd be kind of you to send emails in text format. I've changed the format of this reply from HTML to text.)

I try to remember. Stupid Gmail. Sorry. On mobile it offers very
little control over format but I'll do my best when I can.

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] https://blog.restcase.com/http2-benefits-for-rest-apis/
[3] https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/
[4] https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-11-30 02:45:34 Re: Asynchronous Append on postgres_fdw nodes.
Previous Message Stephen Frost 2020-11-30 02:25:41 Re: Add docs stub for recovery.conf