Quick Links

Re: postgres_fdw: Use COPY to speed up batch inserts

From:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: postgres_fdw: Use COPY to speed up batch inserts
Date:	2025-10-21 14:25:19
Message-ID:	CAFY6G8ePwjT8GiJX1AK5FDMhfq-sOnny6optgTPg98HQw7oJ0g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the review!

On Thu Oct 16, 2025 at 5:39 PM -03, Tomas Vondra wrote:
> Thanks for the patch. Please add it to the next committfest (PG19-3) at
>
> https://commitfest.postgresql.org/
>
> so that we don't lose track of the patch.
>
Here it is: https://commitfest.postgresql.org/patch/6137/

> Thanks. The code looks sensible in general, I think. I'll have a couple
> minor comments. It'd be good to also update the documentation, and add
> some tests to postgres_fdw.sql, to exercise this new code.
>
> The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and
> explain how it's related to the number of parameters in the INSERT
> state. Which does not matter when using COPY under the hood, so this
> should be amended/clarified in some way. It doesn't need to be
> super-detailed, though.
>
I'll work on this and intend to have something on the next version.

> A couple minor comments about the code:
>
> 1) buildCopySql
>
> - Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK
> no, if we use the default copy format in convert_slot_to_copy_text.
>
You're right. I've removed the (FORMAT TEXT, DELIMITER ',') part.

> - Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL?
> Instead of rebuilding it over and over for every batch.
>
I tried to reuse the fmstate->query field to cache the COPY sql but
running the postgres_fdw.sql regress test shows that this may not
work. When we are running a user supplied COPY command on a foreign
table the CopyMultiInsertBufferFlush() call
ri_FdwRoutine->ExecForeignBatchInsert which may pass different values
for numSlots based on the number of slots already sent to the foreign
server, and eventually it may pass numSlots as 1 which will not use the
COPY under the hood to send to the foreign server and if we cache the
COPY command into the fmstate->query this will not work because the
normal INSERT path on execute_foreign_modify uses the fmstate->query to
build a prepared statement to send to the foreign server. So basically
what I'm trying to say is that when the server is executing a COPY into
a foreign it may use the COPY command or INSERT command to send the data
to the foreign server. That being said, I decided to create a new
copy_query field on PgFdwModifyState to cache only COPY commands. Please
let me know if my understanding is wrong or if we could have a better
approach here.

> 2) convert_slot_to_copy_text
>
> - It's probably better to not hard-code the delimiters etc.
>
Since now we are using the default copy format, it's safe to hard code
the default delimiter which is \t? Or should we still make this
parameterizable or something else?

> - I wonder if the formatting needs to do something more like what
> copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe
> not, not sure.
>
I'm still checking this, I think that is not needed but I want to do
some more tests to make sure.

> 3) execute_foreign_modify
>
> - I think the new block of code is a bit confusing. It essentially does
> something similar to the original code, but not entirely. I suggest we
> move it to a new function, and call that from execute_foreign_modify.
>
Fixed

> - I agree it's probably better to do COPYBUFSIZ, or something like that
> to send the data in smaller (but not tiny) chunks.
>
Fixed. I've declared the default chunk size as 8192 (which is the same
used on psql/copy.c), let me know if we should use another value or
perhaps make this configurable.

>> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE)
>> output for batch inserts that use the COPY to mention that we are
>> sending the COPY command to the remote server. I guess so?
>>
>
> Good point. We definitely should not show SQL for INSERT, when we're
> actually running a COPY.
>
This seems a bit tricky to implement. The COPY is used based on the
number of slots into the TupleTableSlot array that is used for batch
insert. The numSlots that execute_foreign_modify() receive is coming
from ResultRelInfo->ri_NumSlots during ExecInsert(). We don't have this
information during EXPLAIN that is handled by
postgresExplainForeignModify(), we only have the
ResultRelInfo->ri_BatchSize at this stage. The current idea is to use
the COPY command if the number of slots is > 1 so I'm wondering if we
should use another mechanism to enable the COPY usage, for example, we
could just use if the batch_size is configured to a number greater than
X, but what if the INSERT statement is only inserting a single row,
should we still use the COPY command to ingest a single row into the
foreign table? Any thoughts?

--
Matheus Alcantara

Attachment	Content-Type	Size
v2-0001-postgres_fdw-Use-COPY-to-speed-up-batch-inserts.patch	application/octet-stream	7.4 KB

In response to

Re: postgres_fdw: Use COPY to speed up batch inserts at 2025-10-16 20:39:47 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matheus Alcantara	2025-10-21 14:26:50	Re: postgres_fdw: Use COPY to speed up batch inserts
Previous Message	Dimitrios Apostolou	2025-10-21 14:16:26	Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward