Re: Fast COPY FROM based on batch insert

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com
Subject: Re: Fast COPY FROM based on batch insert
Date: 2022-07-19 09:35:22
Message-ID: 57733596-5fd6-07ec-d65c-67411411f10e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18/7/2022 13:22, Etsuro Fujita wrote:
> On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 3/22/22 06:54, Etsuro Fujita wrote:
>>> * To allow foreign multi insert, the patch made an invasive change to
>>> the existing logic to determine whether to use multi insert for the
>>> target relation, adding a new member ri_usesMultiInsert to the
>>> ResultRelInfo struct, as well as introducing a new function
>>> ExecMultiInsertAllowed(). But I’m not sure we really need such a
>>> change. Isn’t it reasonable to *adjust* the existing logic to allow
>>> foreign multi insert when possible?
>> Of course, such approach would look much better, if we implemented it.
>
>> I'll ponder how to do it.
>
> I rewrote the decision logic to something much simpler and much less
> invasive, which reduces the patch size significantly. Attached is an
> updated patch. What do you think about that?
>
> While working on the patch, I fixed a few issues as well:
>
> + if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
> + resultRelInfo->ri_BatchSize =
> +
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
>
> When determining the batch size, I think we should check if the
> ExecForeignBatchInsert callback routine is also defined, like other
> places such as execPartition.c. For consistency I fixed this by
> copying-and-pasting the code from that file.
>
> + * Also, the COPY command requires a non-zero input list of attributes.
> + * Therefore, the length of the attribute list is checked here.
> + */
> + if (!cstate->volatile_defexprs &&
> + list_length(cstate->attnumlist) > 0 &&
> + !contain_volatile_functions(cstate->whereClause))
> + target_resultRelInfo->ri_usesMultiInsert =
> + ExecMultiInsertAllowed(target_resultRelInfo);
>
> I think “list_length(cstate->attnumlist) > 0” in the if-test would
> break COPY FROM; it currently supports multi-inserting into *plain*
> tables even in the case where they have no columns, but this would
> disable the multi-insertion support in that case. postgres_fdw would
> not be able to batch into zero-column foreign tables due to the INSERT
> syntax limitation (i.e., the syntax does not allow inserting multiple
> empty rows into a zero-column table in a single INSERT statement).
> Which is the reason why this was added to the if-test? But I think
> some other FDWs might be able to, so I think we should let the FDW
> decide whether to allow batching even in that case, when called from
> GetForeignModifyBatchSize. So I removed the attnumlist test from the
> patch, and modified postgresGetForeignModifyBatchSize as such. I
> might miss something, though.
Thanks a lot,
maybe you forgot this code:
/*
* If a partition's root parent isn't allowed to use it, neither is the
* partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

Also, maybe to describe in documentation, if the value of batch_size is
more than 1, the ExecForeignBatchInsert routine have a chance to be called?

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-07-19 10:07:29 Re: [PATCH] Introduce array_shuffle() and array_sample()
Previous Message Bharath Rupireddy 2022-07-19 09:13:59 Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages