| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
| Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date: | 2026-03-30 19:14:51 |
| Message-ID: | CAD21AoCS4nBcJARX4RtaMFQSSTLGGndEB2kho_yPCvJnRwHFdg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Mar 4, 2026 at 4:17 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> Thank you for the review.
>
> On 03/03/26 16:47, Masahiko Sawada wrote:
> > Thank you for updating the patch! Here are some review comments:
> >
> > + Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
> > +
> > + if (attr->attgenerated)
> > + continue;
> > +
> > + if (!first)
> > + appendStringInfoString(buf, ", ");
> > +
> > + first = false;
> > +
> > + appendStringInfoString(buf,
> > quote_identifier(NameStr(attr->attname)));
> >
> > We need to take care of the 'column_name' option here. If it's set we
> > should not use attr->attname as it is.
> >
>
> Fixed
>
> > --
> > + }
> > + if (nattrs > 0)
> > + appendStringInfoString(buf, ") FROM STDIN");
> > + else
> > + appendStringInfoString(buf, " FROM STDIN");
> > +}
> >
> > It might be better to explicitly specify the format 'text'.
> >
>
> Fixed
>
> > ---
> > + if (useCopy)
> > + {
> > + deparseCopySql(&sql, rel, targetAttrs);
> > + values_end_len = 0; /* Keep compiler quiet */
> > + }
> > + else
> > + deparseInsertSql(&sql, rte, resultRelation, rel,
> > targetAttrs, doNothing,
> > +
> > resultRelInfo->ri_WithCheckOptions,
> > +
> > resultRelInfo->ri_returningList,
> > + &retrieved_attrs,
> > &values_end_len);
> >
> > I think we should consider whether it's okay to use the COPY command
> > even if resultRelInfo->ri_WithCheckOptions is non-NULL. As far as I
> > researched, it's okay as we currently don't support COPY to a view but
> > please consider it as well. We might want to explain it too in the
> > comment.
> >
>
> Good point, fixed.
>
> > How about initializing values_end_len with 0 at its declaration?
> >
>
> Fixed
>
> > ---
> > +-- test that fdw also use COPY FROM as a remote sql
> > +set client_min_messages to 'log';
> > +
> > +create function insert_or_copy() returns trigger as $$
> > +declare query text;
> > +begin
> > + query := current_query();
> > + if query ~* '^COPY' then
> > + raise notice 'COPY command';
> > + elsif query ~* '^INSERT' then
> > + raise notice 'INSERT command';
> > + end if;
> > +return new;
> > +end;
> > +$$ language plpgsql;
> >
> > On second thoughts, it might be okay to output the current_query() as it is.
> >
>
> Fixed
>
> > ---
> > +copy grem1 from stdin;
> > +3
> > +\.
> >
> > I think it would be good to have more tests, for example, checking if
> > the COPY command method can work properly with batch_size and
> > column_name options.
> >
>
> I've added new test cases for these cases. To test the batch_size case
> I've added an elog(DEBUG1) because using the trigger with
> current_query() log an entry for each row that we send for the foreign
> server, with the elog(DEBUG1) we can expect one log message for each
> batch operation. Please let me know if there is better ways to do this.
>
> Please see the new attached version.
I've reviewed the v12 patch, and here are review comments:
The COPY data sent via postgres_fdw should properly escape the input
data. The bug I found can be reproduced with the following scenario:
-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table t (a int, b text) server remote;
create user mapping for public server remote;
-- on remote server
create table t (a int, b text);
-- on local server
copy t(a, d) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 hello\nworld
>> \.
ERROR: invalid input syntax for type integer: "world"
CONTEXT: COPY t, line 2, column a: "world"
remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT)
---
I think the patch misses calling set_transmission_modes() and
reset_transmission_modes() when converting the tuple slots. A
problematic scenario is:
-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table f (a float) server remote;
create user mapping for public server remote;
-- on remote server
create table f (a float);
-- on local server
set extra_float_digits = 0;
copy test_float from stdin;
1.0000000000000002
\.
select * from f;
a
----
1
(1 row)
---
+
+ appendStringInfo(buf, "COPY ");
+ deparseRelation(buf, rel);
+ if (nattrs > 0)
+ appendStringInfo(buf, "(");
appendStringInfoString() or appendStringInfoChar() should be used instead.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sami Imseih | 2026-03-30 19:20:47 | Re: Refactor query normalization into core query jumbling |
| Previous Message | Melanie Plageman | 2026-03-30 19:14:49 | Re: Don't synchronously wait for already-in-progress IO in read stream |