Re: Fast COPY FROM based on batch insert

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com
Subject: Re: Fast COPY FROM based on batch insert
Date: 2022-09-27 09:03:51
Message-ID: CAPmGK1476Smi59cszMZM7z7oeFJ5NF9MOChmHjsF2OZ48eyg-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2022 at 5:30 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> > + /* If any rows were inserted, run AFTER ROW INSERT triggers. */
> > ...
> > + for (i = 0; i < inserted; i++)
> > + {
> > + TupleTableSlot *slot = rslots[i];
> > ...
> > + slot->tts_tableOid =
> > + RelationGetRelid(resultRelInfo->ri_RelationDesc);
> >
> > It seems the return value of `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a variable outside the for loop.
> > Inside the for loop, assign this variable to slot->tts_tableOid.
>
> Actually, I did this to match the code in ExecBatchInsert(), but that
> seems like a good idea, so I’ll update the patch as such in the next
> version.

Done. I also adjusted the code in CopyMultiInsertBufferFlush() a bit
further. No functional changes. I put back in the original position
an assertion ensuring the FDW supports batching. Sorry for the back
and forth. Attached is an updated version of the patch.

Other changes are:

* The previous patch modified postgres_fdw.sql so that the existing
test cases for COPY FROM were tested in batch-insert mode. But I
think we should keep them as-is to test the default behavior, so I
added test cases for this feature by copying-and-pasting some of the
existing test cases. Also, the previous patch added this:

+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+ server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+ server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+ begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+ for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.

Rather than doing so, I think it would be better to use a partitioned
table defined in the above section “test tuple routing for
foreign-table partitions”, to save cycles. So I changed this as such.

* I modified comments a bit further and updated docs.

That is it. I will review the patch a bit more, but I feel that it is
in good shape.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patch application/octet-stream 23.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-09-27 09:28:18 RE: A doubt about a newly added errdetail
Previous Message Masahiko Sawada 2022-09-27 08:51:20 Re: [PATCH] Log details for client certificate failures