Re: Fast COPY FROM based on batch insert

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com
Subject: Re: Fast COPY FROM based on batch insert
Date: 2022-07-22 06:39:23
Message-ID: 05bc8819-3e53-d884-a0d0-a17ffa5febef@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/20/22 13:10, Etsuro Fujita wrote:
> On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 18/7/2022 13:22, Etsuro Fujita wrote:
>>> I rewrote the decision logic to something much simpler and much less
>>> invasive, which reduces the patch size significantly. Attached is an
>>> updated patch. What do you think about that?
>
>> maybe you forgot this code:
>> /*
>> * If a partition's root parent isn't allowed to use it, neither is the
>> * partition.
>> */
>> if (rootResultRelInfo->ri_usesMultiInsert)
>> leaf_part_rri->ri_usesMultiInsert =
>> ExecMultiInsertAllowed(leaf_part_rri);
>
> I think the patch accounts for that. Consider this bit to determine
> whether to use batching for the partition chosen by
> ExecFindPartition():
Agreed.

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
{
/*
* Flush pending inserts if this partition can't use
* batching, so rows are visible to triggers etc.
*/
CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers? AFTER Trigger should see all rows of the table, but
if it isn't exists for parent, I think, we wouldn't obligate to
guarantee order of COPY into two different tables.

--
Regards
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-07-22 06:49:31 Re: pg_tablespace_location() failure with allow_in_place_tablespaces
Previous Message Kyotaro Horiguchi 2022-07-22 06:30:39 Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.