Re: postgres_fdw: Use COPY to speed up batch inserts

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Date: 2026-01-27 19:17:33
Message-ID: CAD21AoAji7ETO0815Cac=7yrntB5o4qiJ9O0LaAH=bDpFAxdyA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 2, 2026 at 12:33 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Fri Jan 2, 2026 at 5:15 PM -03, Masahiko Sawada wrote:
> > +
> > + /*
> > + * Set batch_with_copy_threshold from foreign server/table options. We do
> > + * this outside of create_foreign_modify() because we only want to use
> > + * COPY as a remote SQL when a COPY FROM on a foreign table is executed or
> > + * an insert is being performed on a table partition. In both cases the
> > + * BeginForeignInsert fdw routine is called.
> > + */
> > + fmstate->batch_with_copy_threshold = get_batch_with_copy_threshold(rel);
> >
> > Does it mean that we could end up using the COPY method not only when
> > executing COPY FROM but also when executing INSERT with tuple
> > routings? If so, how does the EXPLAIN command show the remote SQL?
> >
> It meas that we could also use the COPY method to insert rows into a
> specific table partition that is a foreign table.
>
> Let's say that an user execute an INSERT INTO on a partitioned table
> that has partitions that are postgres_fdw tables, with this patch we
> could use the COPY method to insert the rows on these partitions. On
> this scenario we would not have issue with EXPLAIN output because
> currently we do not show the remote SQL being executed on each partition
> that is involved on the INSERT statement.
>
> If an user execute an INSERT directly into a postgres_fdw table we will
> use the normal INSERT statement as we use today.

I'm slightly concerned that it could be confusing for users if we use
the COPY method for the same table based on not only
batch_with_copy_threshold but also how to INSERT. For example, if we
insert tuples directly to a leaf partition, we always use INSERT. On
the other hand, if we insert tuples via its parent table, we would use
either COPY or INSERT based on the number of tuples and
batch_with_copy_threshold value. IIUC this behavior stems from FDW API
design (BeginForeignInsert callback is called only in cases of COPY or
tuple routing), which users would not be aware of in general. Also,
inserting tuples directly to a leaf partition is faster in general
than doing via the parent table, but the COPY method optimization is
available only in the latter case.

How about making use of COPY method only when users execute a COPY
command? Which seems more intuitive and a good start. We can
distinguish BeginForeignInsert called via COPY from called via INSERT
(tuple routing) by adding a flag to ModifyTableState or by checking if
the passed resultRelInfo == resultRelInfo->ri_RootResultRelInfo.

Alternative idea (or an improvement) would be to use the COPY method
whenever the number of buffered tuples exceeds the threshold. It would
cover more cases. Regarding the issue with EXPLAIN output, we could
output both queries (INSERT and COPY) with some contexts (e.g., the
threshold for the COPY method etc).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-01-27 19:24:30 Re: unnecessary executor overheads around seqscans
Previous Message Jeff Davis 2026-01-27 19:16:34 Re: Remaining dependency on setlocale()