| From: | solaimurugan vellaipandiyan <drsolaimurugan(dot)v(at)gmail(dot)com> |
|---|---|
| To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
| Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date: | 2026-05-07 11:38:38 |
| Message-ID: | CAHEL7KSWJD5AiDDi+06eDVNiv8fA3z+wa393nBt002TPOe=awg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi all,
Thank you for the updated Patch.
On Thu, May 7, 2026 at 4:44 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Wed Apr 1, 2026 at 12:50 PM -03, Matheus Alcantara wrote:
> > On Mon Mar 30, 2026 at 4:14 PM -03, Masahiko Sawada wrote:
> >>> Please see the new attached version.
> >>
> >> I've reviewed the v12 patch, and here are review comments:
> >>
> >> The COPY data sent via postgres_fdw should properly escape the input
> >> data. The bug I found can be reproduced with the following scenario:
> >>
> >> -- on local server
> >> create server remote foreign data wrapper postgres_fdw;
> >> create foreign table t (a int, b text) server remote;
> >> create user mapping for public server remote;
> >>
> >> -- on remote server
> >> create table t (a int, b text);
> >>
> >> -- on local server
> >> copy t(a, d) from stdin;
> >> Enter data to be copied followed by a newline.
> >> End with a backslash and a period on a line by itself, or an EOF signal.
> >>>> 1 hello\nworld
> >>>> \.
> >> ERROR: invalid input syntax for type integer: "world"
> >> CONTEXT: COPY t, line 2, column a: "world"
> >> remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT)
> >>
> >
> > I think that we need something like CopyAttributeOutText() here.
> >
> > To fix this I've added appendStringInfoText() which is a similar version
> > of CopyAttributeOutText() that works with a StringInfo. I did not find
> > any function that I could reuse here, if such function exists please let
> > me know.
> >
> > I'm wondering if we should have this similar function or try to combine
> > both to avoid duplicated logic, although it looks complicated to me at
> > first look to combine these both usages.
> >
> > Another option is to use the BINARY format, but it is less portable
> > compared to the TEXT format across machines architectures and
> > PostgreSQL versions [1]. For CSV we can just wrap the string into ' but
> > I think that we can have a performance issue. What do you think?
> >
> > I've also quickly benchmarked this change and I've got very similar
> > execution time, with and without this change.
> >
>
> I've played with changing the format from TEXT to CSV to avoid this
> duplicated code and I'm attaching a new version with the results. We
> still need a special function to handle the escape but I think that it's
> less complicated compared with TEXT.
>
> I was a bit concerned about the performance so I've executed a benchmark
> using pgbench that run a COPY FROM with 100 rows on a foreign table and
> I've got the following results:
>
> Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres
>
> batch_size: 10
> patch tps: 5588.402946
> master tps: 4691.619829
>
> batch_size: 100
> patch tps: 11834.459579
> master tps: 5578.925053
>
> batch_size: 1000
> patch tps: 11181.554907
> master tps: 6452.945124
>
> The results looks good, so I think that CSV is a valid format option.
>
>
The patch applied successfully in my branch tree and tested the patch
by creating 2 clusters (Local cluster - as the foreign table side &
Remote cluster - as the remote table side).
I first reproduced the current behavior on an unpatched tree and observed that:
COPY ft_emp FROM stdin; resulted in row-by-row remote INSERT execution.
After applying the patch and rebuilding along with postgres_fdw, I
repeated the same test and observed the following in the remote
cluster logs as:
COPY public.emp(id, name) FROM STDIN (FORMAT CSV) that indicates the
remote COPY optimization path is being used successfully.
Local Cluster:
postgres=# COPY ft_emp FROM stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 200 A
>> 201 B
>> 202 C
>> \.
COPY 3
Remote Cluster:
postgres=# 2026-05-07 16:30:07.800 IST [210202] LOG: statement: START
TRANSACTION ISOLATION LEVEL REPEATABLE READ
2026-05-07 16:30:42.973 IST [210202] LOG: statement: COPY
public.emp(id, name) FROM STDIN (FORMAT CSV)
2026-05-07 16:30:42.976 IST [210202] LOG: statement: COPY
public.emp(id, name) FROM STDIN (FORMAT CSV)
2026-05-07 16:30:42.977 IST [210202] LOG: statement: COPY
public.emp(id, name) FROM STDIN (FORMAT CSV)
2026-05-07 16:30:42.977 IST [210202] LOG: statement: COMMIT TRANSACTION
Also I tested the behavior with a BEFORE INSERT FOR EACH ROW trigger
defined on the remote table. In my testing, one thing I observed is
that the remote COPY path was still chosen even with a BEFORE INSERT
FOR EACH ROW trigger defined on the remote table and the trigger fired
correctly during COPY execution. I wanted to confirm whether this is
the expected behavior/design for the optimization path. Otherwise the
patch looks good to me from my side.
Looking forward to more feedback on this.
Regards,
Solaimurugan V
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jakob Egger | 2026-05-07 11:41:11 | Broken build on macOS (Universal / Intel): cpuid instruction not available |
| Previous Message | solaimurugan vellaipandiyan | 2026-05-07 11:32:56 | Re: on_error table, saving error info to a table |