Re: Parallel INSERT (INTO ... SELECT ...)

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-11-03 06:25:47
Message-ID: CALDaNm2GK8+0uicSvP3hKYqREymNus5GoxtwhvfTx7Mx2O=qWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> See attached patches.
>

Thanks for providing the patches.
I had reviewed
v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find
my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to
be
+ * included in the transaction state that is serialized in the parallel
DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+ Assert(!IsInParallelMode());
+
+ (void)GetCurrentTransactionId();
+}

-> As we support insertion of data from the workers, this comments "but as
of now, only the leader backend writes into a completely new table. In the
future, we can extend it to allow workers to write into the table" must be
updated accordingly:
+ * modify any data using a CTE, or if this is a cursor operation,
or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and
CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the
leader
* backend writes into a completely new table. In the future, we
can
* extend it to allow workers to write into the table. However, to
allow

-> Also should we specify insert as "insert into select"

-> We could include a small writeup of the design may be in the commit
message. It will be useful for review.

-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard !=
PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in
order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in
parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard =
MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard !=
PROPARALLEL_UNSAFE);
+ }

something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse,
&glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);

-> Comments need slight adjustment, maybe you could run pgindent for the
modified code.
+ /*
+ * Supported table-modification commands may require
additional steps
+ * prior to entering parallel mode, such as assigning a
FullTransactionId.
+ */

-> In the below, max_parallel_hazard_test will return true for
PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case
of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if
(max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a
FK exists in
+ * the relation, and this would result in creation of new
CommandIds
+ * on insert/update/delete and this isn't supported in a
parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we
don't break
+ * here; examine any further triggers ...
+ */
+ }

-> Should we switch to non-parallel mode in this case, instead of throwing
error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin,
&isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint
%u", con->oid);
+ conbin = TextDatumGetCString(val);

-> We could include a few tests for this in regression.

-> We might need some documentation update like in
parallel-query.html/parallel-plans.html, etc

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-11-03 07:05:15 Re: public schema default ACL
Previous Message Amit Kapila 2020-11-03 05:42:53 Re: Fix typo in xlogreader.h and procarray.c