Re: postgres_fdw: Use COPY to speed up batch inserts

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>, "Tomas Vondra" <tomas(at)vondra(dot)me>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Date: 2025-10-23 00:00:46
Message-ID: DDP9DDKVIYQ0.2LTDOCAWGG9M6@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue Oct 21, 2025 at 11:25 AM -03, Matheus Alcantara wrote:
>>> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE)
>>> output for batch inserts that use the COPY to mention that we are
>>> sending the COPY command to the remote server. I guess so?
>>>
>>
>> Good point. We definitely should not show SQL for INSERT, when we're
>> actually running a COPY.
>>
> This seems a bit tricky to implement. The COPY is used based on the
> number of slots into the TupleTableSlot array that is used for batch
> insert. The numSlots that execute_foreign_modify() receive is coming
> from ResultRelInfo->ri_NumSlots during ExecInsert(). We don't have this
> information during EXPLAIN that is handled by
> postgresExplainForeignModify(), we only have the
> ResultRelInfo->ri_BatchSize at this stage. The current idea is to use
> the COPY command if the number of slots is > 1 so I'm wondering if we
> should use another mechanism to enable the COPY usage, for example, we
> could just use if the batch_size is configured to a number greater than
> X, but what if the INSERT statement is only inserting a single row,
> should we still use the COPY command to ingest a single row into the
> foreign table? Any thoughts?
>
Thinking more about this I realize that when we are deparsing the remote
SQL to be sent to the foreign server at the planner phase (via
postgresPlanForeignModify()) we don't have the batch_size and number of
rows information, so currently we can not know at the plan time if the
COPY usage for a batch insert is visible or not because IIUC these
information are only visible at query runtime.

One way to make it possible is that we could simply use the
PgFdwModifyState->copy_data during postgresExplainForeignModify() if
it's not null. Since we will only have this information during query
execution the drawback of this approach is that we would only show the
COPY as a Remote SQL on during EXPLAIN(ANALYZE).

Please see the attached v3 version that implements this idea.

> I tried to reuse the fmstate->query field to cache the COPY sql but
> running the postgres_fdw.sql regress test shows that this may not
> work. When we are running a user supplied COPY command on a foreign
> table the CopyMultiInsertBufferFlush() call
> ri_FdwRoutine->ExecForeignBatchInsert which may pass different values
> for numSlots based on the number of slots already sent to the foreign
> server, and eventually it may pass numSlots as 1 which will not use the
> COPY under the hood to send to the foreign server and if we cache the
> COPY command into the fmstate->query this will not work because the
> normal INSERT path on execute_foreign_modify uses the fmstate->query to
> build a prepared statement to send to the foreign server. So basically
> what I'm trying to say is that when the server is executing a COPY into
> a foreign it may use the COPY command or INSERT command to send the data
> to the foreign server. That being said, I decided to create a new
> copy_query field on PgFdwModifyState to cache only COPY commands. Please
> let me know if my understanding is wrong or if we could have a better
> approach here.
>
Based on the information that I've mention above I think that we need
some way to not mix INSERT with COPY commands when executing a COPY in
a foreign table supplied by the user. Or we should disable the COPY
under the hood and always fallback to INSERT or enable the COPY to use
when the *numSlots is 1, so in case of an EXPLAIN(ANALYZE) output we can
show the Remote SQL correctly. Is that make sense?

I'm still not sure if the trigger to use the COPY command for batch
insert should be *numSlots > 1 or something else. I'm open for better
ideas.

Thoughts?

--
Matheus Alcantara

Attachment Content-Type Size
v3-0001-postgres_fdw-Use-COPY-to-speed-up-batch-inserts.patch text/plain 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-23 00:01:32 Re: Skip unregistered custom kinds on stats load
Previous Message Michael Paquier 2025-10-22 23:46:27 Re: Extended Statistics set/restore/clear functions.