Quick Links

Re: postgres_fdw: Use COPY to speed up batch inserts

From:	"Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To:	"Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "jian he" <jian(dot)universality(at)gmail(dot)com>, "Tomas Vondra" <tomas(at)vondra(dot)me>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: postgres_fdw: Use COPY to speed up batch inserts
Date:	2026-04-07 21:00:56
Message-ID:	DHN84NSGBYYO.123WDP7K8D21D@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed Apr 1, 2026 at 12:50 PM -03, Matheus Alcantara wrote:
> On Mon Mar 30, 2026 at 4:14 PM -03, Masahiko Sawada wrote:
>>> Please see the new attached version.
>>
>> I've reviewed the v12 patch, and here are review comments:
>>
>> The COPY data sent via postgres_fdw should properly escape the input
>> data. The bug I found can be reproduced with the following scenario:
>>
>> -- on local server
>> create server remote foreign data wrapper postgres_fdw;
>> create foreign table t (a int, b text) server remote;
>> create user mapping for public server remote;
>>
>> -- on remote server
>> create table t (a int, b text);
>>
>> -- on local server
>> copy t(a, d) from stdin;
>> Enter data to be copied followed by a newline.
>> End with a backslash and a period on a line by itself, or an EOF signal.
>>>> 1 hello\nworld
>>>> \.
>> ERROR: invalid input syntax for type integer: "world"
>> CONTEXT: COPY t, line 2, column a: "world"
>> remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT)
>>
>
> I think that we need something like CopyAttributeOutText() here.
>
> To fix this I've added appendStringInfoText() which is a similar version
> of CopyAttributeOutText() that works with a StringInfo. I did not find
> any function that I could reuse here, if such function exists please let
> me know.
>
> I'm wondering if we should have this similar function or try to combine
> both to avoid duplicated logic, although it looks complicated to me at
> first look to combine these both usages.
>
> Another option is to use the BINARY format, but it is less portable
> compared to the TEXT format across machines architectures and
> PostgreSQL versions [1]. For CSV we can just wrap the string into ' but
> I think that we can have a performance issue. What do you think?
>
> I've also quickly benchmarked this change and I've got very similar
> execution time, with and without this change.
>

I've played with changing the format from TEXT to CSV to avoid this
duplicated code and I'm attaching a new version with the results. We
still need a special function to handle the escape but I think that it's
less complicated compared with TEXT.

I was a bit concerned about the performance so I've executed a benchmark
using pgbench that run a COPY FROM with 100 rows on a foreign table and
I've got the following results:

Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres

batch_size: 10
patch tps: 5588.402946
master tps: 4691.619829

batch_size: 100
patch tps: 11834.459579
master tps: 5578.925053

batch_size: 1000
patch tps: 11181.554907
master tps: 6452.945124

The results looks good, so I think that CSV is a valid format option.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Attachment	Content-Type	Size
v14-0001-postgres_fdw-Use-COPY-as-remote-SQL-when-possibl.patch	text/plain	18.7 KB

In response to

Re: postgres_fdw: Use COPY to speed up batch inserts at 2026-04-01 15:50:14 from Matheus Alcantara

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2026-04-07 21:01:43	Re: Unfortunate pushing down of expressions below sort
Previous Message	Lukas Fittl	2026-04-07 20:30:11	Re: Stack-based tracking of per-node WAL/buffer usage