Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-15 08:42:15
Message-ID: CAA4eK1Kb=LRDUFow7238Mr_1VRK-dnpk2jpYkX7x1jDyx2qEKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> Posting an updated set of patches to address recent feedback:
>

Here is an additional review of
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
quite a few comments raised on the V11-0001* patch. I suggest first
post a revised version of V11-0001* patch addressing those comments
and then you can separately post a revised version of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.

Few comments:
==============
1.
+char
+max_parallel_hazard_for_modify(Query *parse, const char
*initial_max_parallel_hazard)
{
..
+ return (rel_max_parallel_hazard_for_modify(rte->relid,
parse->commandType, &context, NoLock));
..
}

rel_max_parallel_hazard_for_modify()
{
..
+ rel = table_open(relid, lockmode);
..
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
..
+ /*
+ * Column default expressions for columns in the target-list are
+ * already being checked for parallel-safety in the
+ * max_parallel_hazard() scan of the query tree in standard_planner().
+ */
+
+ tupdesc = RelationGetDescr(rel);
}

Here, it seems we are accessing the relation descriptor without any
lock on the table which is dangerous considering the table can be
modified in a parallel session. Is there a reason why you think this
is safe? Did you check anywhere else such a coding pattern?

2.
+ /*
+ * If there are any index expressions, check that they are parallel-mode
+ * safe.
+ */
+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context);
+ if (max_parallel_hazard_test(max_hazard, context))
+ {
+ table_close(rel, lockmode);
+ return context->max_hazard;
+ }

Here and at all other similar places, the call to
max_parallel_hazard_test seems redundant because
index_expr_max_parallel_hazard_for_modify would have already done
that. Why can't we just return true/false from
index_expr_max_parallel_hazard_for_modify? The context would have been
already updated for max_hazard.

3.
@@ -342,6 +343,18 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = max_parallel_hazard_for_modify(parse,
&glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
}

I don't like this way of checking parallel_hazard for modify. This not
only duplicates some code in max_parallel_hazard_for_modify from
max_parallel_hazard but also appears quite awkward. Can we move
max_parallel_hazard_for_modify inside max_parallel_hazard? Basically,
after calling max_parallel_hazard_walker, we can check for modify
statement and call the new function.

4.
domain_max_parallel_hazard_for_modify()
{
..
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ break;
+ }
..
}
index_expr_max_parallel_hazard_for_modify()
{
..
+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ index_close(index_rel, lockmode);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ return context->max_hazard;
+ }
..
}

It is not clear why the above two are shouldn't happen cases and if so
why we want to treat them as unsafe. Ideally, there should be an
Assert if these can't happen but it is difficult to decide without
knowing why you have considered them unsafe?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-01-15 08:47:41 Re: list of extended statistics on psql
Previous Message Kyotaro Horiguchi 2021-01-15 08:29:40 Re: Wrong HINT during database recovery when occur a minimal wal.