From: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: postgres_fdw: Use COPY to speed up batch inserts |
Date: | 2025-10-21 14:25:19 |
Message-ID: | CAFY6G8ePwjT8GiJX1AK5FDMhfq-sOnny6optgTPg98HQw7oJ0g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the review!
On Thu Oct 16, 2025 at 5:39 PM -03, Tomas Vondra wrote:
> Thanks for the patch. Please add it to the next committfest (PG19-3) at
>
> https://commitfest.postgresql.org/
>
> so that we don't lose track of the patch.
>
Here it is: https://commitfest.postgresql.org/patch/6137/
> Thanks. The code looks sensible in general, I think. I'll have a couple
> minor comments. It'd be good to also update the documentation, and add
> some tests to postgres_fdw.sql, to exercise this new code.
>
> The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and
> explain how it's related to the number of parameters in the INSERT
> state. Which does not matter when using COPY under the hood, so this
> should be amended/clarified in some way. It doesn't need to be
> super-detailed, though.
>
I'll work on this and intend to have something on the next version.
> A couple minor comments about the code:
>
> 1) buildCopySql
>
> - Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK
> no, if we use the default copy format in convert_slot_to_copy_text.
>
You're right. I've removed the (FORMAT TEXT, DELIMITER ',') part.
> - Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL?
> Instead of rebuilding it over and over for every batch.
>
I tried to reuse the fmstate->query field to cache the COPY sql but
running the postgres_fdw.sql regress test shows that this may not
work. When we are running a user supplied COPY command on a foreign
table the CopyMultiInsertBufferFlush() call
ri_FdwRoutine->ExecForeignBatchInsert which may pass different values
for numSlots based on the number of slots already sent to the foreign
server, and eventually it may pass numSlots as 1 which will not use the
COPY under the hood to send to the foreign server and if we cache the
COPY command into the fmstate->query this will not work because the
normal INSERT path on execute_foreign_modify uses the fmstate->query to
build a prepared statement to send to the foreign server. So basically
what I'm trying to say is that when the server is executing a COPY into
a foreign it may use the COPY command or INSERT command to send the data
to the foreign server. That being said, I decided to create a new
copy_query field on PgFdwModifyState to cache only COPY commands. Please
let me know if my understanding is wrong or if we could have a better
approach here.
> 2) convert_slot_to_copy_text
>
> - It's probably better to not hard-code the delimiters etc.
>
Since now we are using the default copy format, it's safe to hard code
the default delimiter which is \t? Or should we still make this
parameterizable or something else?
> - I wonder if the formatting needs to do something more like what
> copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe
> not, not sure.
>
I'm still checking this, I think that is not needed but I want to do
some more tests to make sure.
> 3) execute_foreign_modify
>
> - I think the new block of code is a bit confusing. It essentially does
> something similar to the original code, but not entirely. I suggest we
> move it to a new function, and call that from execute_foreign_modify.
>
Fixed
> - I agree it's probably better to do COPYBUFSIZ, or something like that
> to send the data in smaller (but not tiny) chunks.
>
Fixed. I've declared the default chunk size as 8192 (which is the same
used on psql/copy.c), let me know if we should use another value or
perhaps make this configurable.
>> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE)
>> output for batch inserts that use the COPY to mention that we are
>> sending the COPY command to the remote server. I guess so?
>>
>
> Good point. We definitely should not show SQL for INSERT, when we're
> actually running a COPY.
>
This seems a bit tricky to implement. The COPY is used based on the
number of slots into the TupleTableSlot array that is used for batch
insert. The numSlots that execute_foreign_modify() receive is coming
from ResultRelInfo->ri_NumSlots during ExecInsert(). We don't have this
information during EXPLAIN that is handled by
postgresExplainForeignModify(), we only have the
ResultRelInfo->ri_BatchSize at this stage. The current idea is to use
the COPY command if the number of slots is > 1 so I'm wondering if we
should use another mechanism to enable the COPY usage, for example, we
could just use if the batch_size is configured to a number greater than
X, but what if the INSERT statement is only inserting a single row,
should we still use the COPY command to ingest a single row into the
foreign table? Any thoughts?
--
Matheus Alcantara
Attachment | Content-Type | Size |
---|---|---|
v2-0001-postgres_fdw-Use-COPY-to-speed-up-batch-inserts.patch | application/octet-stream | 7.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Matheus Alcantara | 2025-10-21 14:26:50 | Re: postgres_fdw: Use COPY to speed up batch inserts |
Previous Message | Dimitrios Apostolou | 2025-10-21 14:16:26 | Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward |