RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Andrey Lepikhov' <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [POC] Fast COPY FROM command for the table with foreign partitions
Date: 2020-10-20 02:30:57
Message-ID: TYAPR01MB2990DC396B338C98F27C8ED3FE1F0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrey-san,

Thanks for the revision. The patch looks good except for the following two items.

(18)
+ if (target_resultRelInfo->ri_FdwRoutine != NULL)
+ {
+ if (target_resultRelInfo->ri_usesMultiInsert)
+ {
+ Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+ target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+ resultRelInfo);
+ }

> > (14)
> > @@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState
> *mtstate,
> > ResultRelInfo *resultRelInfo = proute->partitions[i];
> >
> > /* Allow any FDWs to shut down */
> > - if (resultRelInfo->ri_FdwRoutine != NULL &&
> > - resultRelInfo->ri_FdwRoutine->EndForeignInsert !=
> NULL)
> > -
> resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
> > -
> resultRelInfo);
> > + if (resultRelInfo->ri_FdwRoutine != NULL)
> > + {
> > + if (resultRelInfo->ri_usesMultiInsert)
> > + {
> > +
> Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
> > +
> resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
> > +
> resultRelInfo);
> > + }
> > + else if
> (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
> > +
> resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
> > +
> resultRelInfo);
> > + }
> >
> > EndForeignCopy() is an optional function, isn't it? That is, it's called if it's
> defined.
> >
> ri_usesMultiInsert must guarantee that we will use multi-insertions. And we
> use only assertions to control this.

The code appears to require both BeginForeignCopy and EndForeignCopy, while the following documentation says they are optional. Which is correct? (I suppose the latter is correct just like other existing Begin/End functions are optional.)

+ If the <function>BeginForeignCopy</function> pointer is set to
+ <literal>NULL</literal>, no action is taken for the initialization.

+ If the <function>EndForeignCopy</function> pointer is set to
+ <literal>NULL</literal>, no action is taken for the termination.

> > (2)
> > + Assert(rri->ri_usesMultiInsert == false);
> >
> > As the above assertion represents, I'm afraid the semantics of
> ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are
> unclear. In CopyFrom(), ri_usesMultiInsert is set by also considering the
> COPY-specific conditions:
> >
> > + if (!cstate->volatile_defexprs &&
> > + !contain_volatile_functions(cstate->whereClause) &&
> > + ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
> > + target_resultRelInfo->ri_usesMultiInsert = true;
> >
> > On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set
> purely based on the relation's characteristics.
> >
> > + leaf_part_rri->ri_usesMultiInsert =
> > + ExecRelationAllowsMultiInsert(leaf_part_rri,
> rootResultRelInfo);
> >
> > In addition to these differences, I think it's a bit confusing that the function
> itself doesn't record the check result in ri_usesMultiInsert.
> >
> > It's probably easy to understand to not add ri_usesMultiInsert, and the
> function just encapsulates the check logic based solely on the relation
> characteristics and returns the result. So, the argument is just one
> ResultRelInfo. The caller (e.g. COPY) combines the function result with other
> specific conditions.

> I can't fully agreed with this suggestion. We do so because in the future anyone
> can call this code from another subsystem for another purposes. And we want
> all the relation-related restrictions contains in one routine. CopyState-related
> restrictions used in copy.c only and taken out of this function.

I'm sorry if I'm misinterpreting you, but I think the following simply serves its role sufficiently and cleanly without using ri_usesMultiInsert.

bool
ExecRelationAllowsMultiInsert(RelationRelInfo *rri)
{
check if the relation allows multiinsert based on its characteristics;
return true or false;
}

I'm concerned that if one subsystem sets ri_usesMultiInsert to true based on its additional specific conditions, it might lead to another subsystem's misjudgment. For example, when subsystem A and B want to do different things respectively:

[Subsystem A]
if (ExecRelationAllowsMultiInsert(rri) && {A's conditions})
rri->ri_usesMultiInsert = true;
...
if (rri->ri_usesMultiInsert)
do A's business;

[Subsystem B]
if (rri->ri_usesMultiInsert)
do B's business;

Here, what if subsystem A and B don't want each other's specific conditions to hold true? That is, A wants to do A's business only if B's specific conditions don't hold true. If A sets rri->ri_usesMultiInsert to true and passes rri to B, then B wrongly does B's business despite that A's specific conditions are true.

(I think this is due to some form of violation of encapsulation.)

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2020-10-20 02:31:11 Add statistics to pg_stat_wal view for wal related parameter tuning
Previous Message Amit Langote 2020-10-20 02:28:26 Re: partition routing layering in nodeModifyTable.c