Re: Fast COPY FROM based on batch insert

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com
Subject: Re: Fast COPY FROM based on batch insert
Date: 2022-07-18 08:22:49
Message-ID: CAPmGK14ov-MbQtwHuSK5ESUeL1csgCVC1MAAwZxgV=NWsBE2ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 3/22/22 06:54, Etsuro Fujita wrote:
> > * To allow foreign multi insert, the patch made an invasive change to
> > the existing logic to determine whether to use multi insert for the
> > target relation, adding a new member ri_usesMultiInsert to the
> > ResultRelInfo struct, as well as introducing a new function
> > ExecMultiInsertAllowed(). But I’m not sure we really need such a
> > change. Isn’t it reasonable to *adjust* the existing logic to allow
> > foreign multi insert when possible?
> Of course, such approach would look much better, if we implemented it.

> I'll ponder how to do it.

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly. Attached is an
updated patch. What do you think about that?

While working on the patch, I fixed a few issues as well:

+ if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+ resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c. For consistency I fixed this by
copying-and-pasting the code from that file.

+ * Also, the COPY command requires a non-zero input list of attributes.
+ * Therefore, the length of the attribute list is checked here.
+ */
+ if (!cstate->volatile_defexprs &&
+ list_length(cstate->attnumlist) > 0 &&
+ !contain_volatile_functions(cstate->whereClause))
+ target_resultRelInfo->ri_usesMultiInsert =
+ ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case. postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test? But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize. So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such. I
might miss something, though.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-1.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-07-18 08:51:32 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message John Naylor 2022-07-18 08:18:44 Re: Proposal to introduce a shuffle function to intarray extension