Re: postgres_fdw: Use COPY to speed up batch inserts

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Date: 2026-03-30 19:14:51
Message-ID: CAD21AoCS4nBcJARX4RtaMFQSSTLGGndEB2kho_yPCvJnRwHFdg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2026 at 4:17 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> Thank you for the review.
>
> On 03/03/26 16:47, Masahiko Sawada wrote:
> > Thank you for updating the patch! Here are some review comments:
> >
> > + Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
> > +
> > + if (attr->attgenerated)
> > + continue;
> > +
> > + if (!first)
> > + appendStringInfoString(buf, ", ");
> > +
> > + first = false;
> > +
> > + appendStringInfoString(buf,
> > quote_identifier(NameStr(attr->attname)));
> >
> > We need to take care of the 'column_name' option here. If it's set we
> > should not use attr->attname as it is.
> >
>
> Fixed
>
> > --
> > + }
> > + if (nattrs > 0)
> > + appendStringInfoString(buf, ") FROM STDIN");
> > + else
> > + appendStringInfoString(buf, " FROM STDIN");
> > +}
> >
> > It might be better to explicitly specify the format 'text'.
> >
>
> Fixed
>
> > ---
> > + if (useCopy)
> > + {
> > + deparseCopySql(&sql, rel, targetAttrs);
> > + values_end_len = 0; /* Keep compiler quiet */
> > + }
> > + else
> > + deparseInsertSql(&sql, rte, resultRelation, rel,
> > targetAttrs, doNothing,
> > +
> > resultRelInfo->ri_WithCheckOptions,
> > +
> > resultRelInfo->ri_returningList,
> > + &retrieved_attrs,
> > &values_end_len);
> >
> > I think we should consider whether it's okay to use the COPY command
> > even if resultRelInfo->ri_WithCheckOptions is non-NULL. As far as I
> > researched, it's okay as we currently don't support COPY to a view but
> > please consider it as well. We might want to explain it too in the
> > comment.
> >
>
> Good point, fixed.
>
> > How about initializing values_end_len with 0 at its declaration?
> >
>
> Fixed
>
> > ---
> > +-- test that fdw also use COPY FROM as a remote sql
> > +set client_min_messages to 'log';
> > +
> > +create function insert_or_copy() returns trigger as $$
> > +declare query text;
> > +begin
> > + query := current_query();
> > + if query ~* '^COPY' then
> > + raise notice 'COPY command';
> > + elsif query ~* '^INSERT' then
> > + raise notice 'INSERT command';
> > + end if;
> > +return new;
> > +end;
> > +$$ language plpgsql;
> >
> > On second thoughts, it might be okay to output the current_query() as it is.
> >
>
> Fixed
>
> > ---
> > +copy grem1 from stdin;
> > +3
> > +\.
> >
> > I think it would be good to have more tests, for example, checking if
> > the COPY command method can work properly with batch_size and
> > column_name options.
> >
>
> I've added new test cases for these cases. To test the batch_size case
> I've added an elog(DEBUG1) because using the trigger with
> current_query() log an entry for each row that we send for the foreign
> server, with the elog(DEBUG1) we can expect one log message for each
> batch operation. Please let me know if there is better ways to do this.
>
> Please see the new attached version.

I've reviewed the v12 patch, and here are review comments:

The COPY data sent via postgres_fdw should properly escape the input
data. The bug I found can be reproduced with the following scenario:

-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table t (a int, b text) server remote;
create user mapping for public server remote;

-- on remote server
create table t (a int, b text);

-- on local server
copy t(a, d) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 hello\nworld
>> \.
ERROR: invalid input syntax for type integer: "world"
CONTEXT: COPY t, line 2, column a: "world"
remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT)

---
I think the patch misses calling set_transmission_modes() and
reset_transmission_modes() when converting the tuple slots. A
problematic scenario is:

-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table f (a float) server remote;
create user mapping for public server remote;

-- on remote server
create table f (a float);

-- on local server
set extra_float_digits = 0;
copy test_float from stdin;
1.0000000000000002
\.
select * from f;
a
----
1
(1 row)

---
+
+ appendStringInfo(buf, "COPY ");
+ deparseRelation(buf, rel);
+ if (nattrs > 0)
+ appendStringInfo(buf, "(");

appendStringInfoString() or appendStringInfoChar() should be used instead.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-03-30 19:20:47 Re: Refactor query normalization into core query jumbling
Previous Message Melanie Plageman 2026-03-30 19:14:49 Re: Don't synchronously wait for already-in-progress IO in read stream