Re: postgres_fdw: Use COPY to speed up batch inserts

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Date: 2025-11-19 23:32:46
Message-ID: CAD21AoBhVS1R8pzvMePsp4ngRTvwOeM0qTkqFmFOj_F9LGNFuw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 18, 2025 at 2:13 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Mon Nov 17, 2025 at 11:03 PM -03, Masahiko Sawada wrote:
> > IIUC the performance regression occurs when users insert many rows
> > into a foreign table with batch_size = 1 and use_copy_for_insert =
> > true (tps: 133.451518 vs. 132.644801 vs. 88.998804). Since batch_size
> > defaults to 1, users might experience performance issues if they
> > enable use_copy_for_insert without adjusting the batch_size. I'm
> > worried that users could easily find themselves in this situation.
> >
> Yes, you are correct. The 0002 patch aims to reduce this issue by using
> the COPY command only if the use_copy_for_insert = true and if
> batch_size > 1 which will reduce the cases but the regression can still
> happen if the user send a single row to insert into a foreign table.
>
> Inserting a single row into a foreign table using COPY is a bit slower
> compared with using INSERT. See the followinw pgbench results:
>
> (Single row using INSERT)
> tps = 19814.535944
>
> (Single row using COPY)
> tps = 16562.324025
>
> I think that the documentation should mention that just changing
> use_copy_for_insert without also changing the batch_size option could
> cause performance regression.
>
> > One possible solution would be to introduce a threshold, like
> > copy_min_row, which would specify the minimum number of rows needed
> > before switching to the COPY command. However, this would require
> > coordination with batch_size since having copy_min_row lower than
> > batch_size wouldn't make sense.
> >
> The only problem that I see with this approach is that it would make
> EXPLAIN(VERBOSE) and EXPLAIN(ANALYZE, VERBOSE) remote SQL output
> different. The user will never know with EXPLAIN (without analyze) if
> the COPY will be used or not. Is this a problem or I'm being to much
> conservative?

I think that's a valid concern. Is it a good idea to show both queries
with some additional information (e.g., threshold to switch using COPY
command)?

> I think that we can do such coordination on postgres_fdw_validator().
>
> Also if we decide to go with this idea it seems to me that we would have
> to much table options to configure to enable the COPY opitimization, we
> would need "copy_min_row", "batch_size" and "use_copy_for_insert". What
> about decide to use the COPY command if use_copy_for_insert = true and
> the number of rows being inserted is >= batch_size?

Sounds like a reasonable idea.

>
> > Alternatively, when users are using batch insertion (batch_size > 0),
> > we could use the COPY command only for full batches and fall back to
> > INSERT for partial ones.
> >
> IIUC in this case we would sent COPY and INSERT statements to the
> foreign server for the same execution, for example, if batch_size = 100
> and the user try insert 105 rows into the foreign table we will send a
> COPY statement with 100 rows and then an INSERT with the 5 rows
> remaining? If that's the case which SQL we should show on Remote SQL
> from EXPLAIN(ANALYZE, VERBOSE) output? I think that this can cause some
> confusion.
>
> > BTW I noticed that use_copy_for_insert option doesn't work with COPY
> > FROM command. I got the following error with use_copy_for_insert=true
> > and batch_size=3:
> >
> > postgres(1:2546195)=# copy t from '/tmp/a.csv'; -- table 't' is a foreign table.
> > ERROR: there is no parameter $1
> > CONTEXT: remote SQL command: INSERT INTO public.t(c) VALUES ($1)
> > COPY t
> >
> Thanks for testing this case. The problem was that I as checking if the
> COPY can be used inside create_foreign_modify() that is called by
> BeginForeignInsert and also BeginForeignModify() and the COPY can be
> used only by the foreign modify path. To fix this issue I've moved the
> check to postgresBeginForeignModify().
>
> I'm attaching v6 with the following changes:
> - I've squashed 0002 into 0001, so now the COPY will only be used if
> use_copy_for_insert = true and if batch_size > 1
> - Fix for the bug of COPY FROM a foreign table
> - New test case for the COPY bug

Thank you for updating the patch!

I think one key point in the patch is whether or not it's okay to
switch using COPY based on the actual number of tuples inserted. While
it should be okay from the performance perspective, it might be an
issue that the remote query shown in EXPLAIN (without ANALYZE) might
be different from the actual query sent. If there is a way to
distinguish the batch insertion between INSERT and COPY in
postgres_fdw, it might be a good idea to use COPY command for the
remote query only when the COPY FROM comes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-11-19 23:55:20 Re: vacuumdb: add --dry-run
Previous Message David Rowley 2025-11-19 23:23:25 Re: 10% drop in code line count in PG 17