Re: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-06-29 05:00:28
Message-ID: CA+HiwqFejKHaAQRvKqdWCJ+NAgsK+Sz6N46Ct_0sLte5hB0nTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> One of the issues I'm fairly regularly reminded by users/customers is
> that inserting into tables sharded using FDWs are rather slow. We do
> even get it reported on pgsql-bugs from time to time [1].
>
> Some of the slowness / overhead is expected, doe to the latency between
> machines in the sharded setup. Even just 1ms latency will make it way
> more expensive than a single instance.
>
> But let's do a simple experiment, comparing a hash-partitioned regular
> partitions, and one with FDW partitions in the same instance. Scripts to
> run this are attached. The duration of inserting 1M rows to this table
> (average of 10 runs on my laptop) looks like this:
>
> regular: 2872 ms
> FDW: 64454 ms
>
> Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
> Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
> which is probably where most single-DC clusters are :-(
>
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.
>
> For some use cases this may be reduced by having many independent
> connnections from different users, so the per-user latency is higher but
> acceptable. But if you need to import larger amounts of data (say, a CSV
> file for analytics, ...) this may not work.
>
> Some time ago I wrote an ugly PoC adding batching, just to see how far
> would it get us, and it seems quite promising - results for he same
> INSERT benchmarks look like this:
>
> FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice indeed.

> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.

I think you may want to take a look this recent proposal by Andrey Lepikhov:

* [POC] Fast COPY FROM command for the table with foreign partitions *
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-06-29 05:20:59 [PATCH] Performance Improvement For Copy From Binary Files
Previous Message Masahiko Sawada 2020-06-29 04:55:56 Re: Resetting spilled txn statistics in pg_stat_replication