Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-10 11:55:42
Message-ID: CAA4eK1LFg6i+n1Xc=vKR7OrQfUtCfa7+tbtUh4don1DtY5Agxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 9, 2020 at 2:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> >
> > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > Also, I have attached a separate patch (requested by Andres Freund)
> > that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> > ..." to be parallel.
> >
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>

Few initial comments on 0004-ParallelInsertSelect:

1.
@@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-

I have speculated above [1] to see if we can change this Assert
condition instead of just removing it? Have you considered that
suggestion?

2.
@@ -764,12 +778,13 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed
+ * if currentCommandIdUsed was already true at the start of the
+ * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise
+ * forbid setting currentCommandIdUsed because we have no provision
+ * for communicating this back to the leader.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}

Once we allowed this, won't the next CommandCounterIncrement() in the
worker will increment the commandId which will lead to using different
commandIds in worker and leader? Is that prevented in some way, if so,
how? Can we document the same?

3.
@@ -173,7 +173,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* against performing unsafe operations in parallel mode, but this gives a
* more user-friendly error message.
*/
- if ((XactReadOnly || IsInParallelMode()) &&
+ if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
ExecCheckXactReadOnly(queryDesc->plannedstmt);

I don't think above change is correct. We need to extend the below
check in ExecCheckXactReadOnly() because otherwise, it can allow
Insert operations even when XactReadOnly is set which we don't want.

ExecCheckXactReadOnly()
{
..
if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
..
}

4.
@@ -173,18 +175,20 @@ ExecSerializePlan(Plan *plan, EState *estate)
* PlannedStmt to start the executor.
- pstmt->hasReturning = false;
- pstmt->hasModifyingCTE = false;
+ pstmt->hasReturning = estate->es_plannedstmt->hasReturning;
+ pstmt->hasModifyingCTE = estate->es_plannedstmt->hasModifyingCTE;

Why change hasModifyingCTE?

5.
+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */

The multi-line comment should start from the second line. I see a
similar problem at other places in the patch as well.

6.
@@ -178,6 +214,25 @@ ExecGather(PlanState *pstate)
node->pei,
gather->initParam);

+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */
+ if (nodeModifyTableState->fireBSTriggers)
+ {
+ fireBSTriggers(nodeModifyTableState);
+ nodeModifyTableState->fireBSTriggers = false;
+
+ /*
+ * Disable firing of AFTER STATEMENT triggers by local
+ * plan execution (ModifyTable processing). These will be
+ * fired at end of Gather processing.
+ */
+ nodeModifyTableState->fireASTriggers = false;
+ }
+ }

Can we encapsulate this in a separate function? It seems a bit odd to
directly do this ExecGather.

7.
@@ -418,14 +476,25 @@ ExecShutdownGatherWorkers(GatherState *node)
void
ExecShutdownGather(GatherState *node)
{
- ExecShutdownGatherWorkers(node);
+ if (node->pei == NULL)
+ return;

- /* Now destroy the parallel context. */
- if (node->pei != NULL)

So after this patch if "node->pei == NULL" then we won't shutdown
workers here? Why so?

8. You have made changes related to trigger execution for Gather node,
don't we need similar changes for GatherMerge node?

9.
@@ -383,7 +444,21 @@ cost_gather(GatherPath *path, PlannerInfo *root,

/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * For Parallel INSERT, provided no tuples are returned from workers
+ * to gather/leader node, don't add a cost-per-row, as each worker
+ * parallelly inserts the tuples that result from its chunk of plan
+ * execution. This change may make the parallel plan cheap among all
+ * other plans, and influence the planner to consider this parallel
+ * plan.
+ */
+ if (!(IsA(path->subpath, ModifyTablePath) &&
+ castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
+ castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
+ {
+ run_cost += parallel_tuple_cost * path->path.rows;
+ }

Isn't the last condition in above check "castNode(ModifyTablePath,
path->subpath)->returningLists != NULL" should be
"castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
instead? Because otherwise when there is returning list it won't count
the cost for passing tuples via Gather node. This might be reason of
what Thomas has seen in his recent email [2].

10. Don't we need a change similar to cost_gather in
cost_gather_merge? It seems you have made only partial changes for
GatherMerge node.

[1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CA%2BhUKGLZB%3D1Q%2BAQQEEmffr3bUMAh%2BJD%2BJ%2B7axv%2BK10Kea0U9TQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Christensen 2020-10-10 12:16:47 Re: [PATCH] Add `truncate` option to subscription commands
Previous Message Juan José Santamaría Flecha 2020-10-10 11:31:21 Re: BUG #15858: could not stat file - over 4GB