Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-15 12:43:48
Message-ID: CAA4eK1Lk37E2nrJb9WGyaAjA3FeM+hLM1kE0Jyfb49HeKqoWtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It might be a good idea to first just get this patch committed, if
> > possible. So, I have reviewed the latest version of this patch:
> >
> > 0001-InsertParallelSelect
>
> I've attached an updated InsertParallelSelect patch (v2) - allowing
> underlying parallel SELECT for "INSERT INTO ... SELECT ...".
> I think I've addressed most of the issues identified in the previous
> version of the patch.
> I'm still seeing a couple of errors in the tests when
> "force_parallel_mode=regress" is in effect, and those need to be
> addressed (extra checks required to avoid parallel SELECT in certain
> cases).
>

I am getting below error in force_parallel_mode:
@@ -1087,9 +1087,14 @@
ERROR: value for domain inotnull violates check constraint "inotnull_check"
create table dom_table (x inotnull);
insert into dom_table values ('1');
+ERROR: cannot start commands during a parallel operation
+CONTEXT: SQL function "sql_is_distinct_from"

It happened with below test:

create function sql_is_distinct_from(anyelement, anyelement)
returns boolean language sql
as 'select $1 is distinct from $2 limit 1';

create domain inotnull int
check (sql_is_distinct_from(value, null));

create table dom_table (x inotnull);
insert into dom_table values ('1');

So it is clear that this is happening because we have allowed insert
that is parallel-unsafe. The attribute is of type domain which has a
parallel-unsafe constraint. As per your current code, we need to
detect it in IsRelParallelModeSafeForModify. The idea would be to
check the type of each attribute and if it is domain type then we need
to check if it has a constraint (See function ExecGrant_Type on how to
detect a domain type and then refer to functions
AlterTypeNamespaceInternal and AlterConstraintNamespaces to know how
to determine constraint for domain type). Once you can find a
constraint then you already have code in your patch to find if it
contains parallel-unsafe expression.

> Also, I'm seeing a partition-related error when running
> installcheck-world - I'm investigating that.
>

Okay.

Few more comments:
==================
1.
+ /*
+ * Can't support execution of row-level or transition-table triggers
+ * during parallel-mode, since such triggers may query the table
+ * into which the data is being inserted, and the content returned
+ * would vary unpredictably according to the order of retrieval by
+ * the workers and the rows already inserted.
+ */
+ if (trigdesc != NULL &&
+ (trigdesc->trig_insert_instead_row ||
+ trigdesc->trig_insert_before_row ||
+ trigdesc->trig_insert_after_row ||
+ trigdesc->trig_insert_new_table))
+ {
+ return false;
+ }

I don't think it is a good idea to prohibit all before/after/instead
row triggers because for the case you are referring to should mark
trigger functions as parallel-unsafe. We might want to have to Assert
somewhere to detect if there is illegal usage but I don't see the need
to directly prohibit them.

2.
@@ -56,8 +57,8 @@ GetNewTransactionId(bool isSubXact)
* Workers synchronize transaction state at the beginning of each parallel
* operation, so we can't account for new XIDs after that point.
*/
- if (IsInParallelMode())
- elog(ERROR, "cannot assign TransactionIds during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign TransactionIds in a parallel worker");

/*
* During bootstrap initialization, we return the special bootstrap
diff --git a/src/backend/access/transam/xact.c
b/src/backend/access/transam/xact.c
index af6afce..ef423fb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -580,8 +580,8 @@ AssignTransactionId(TransactionState s)
* Workers synchronize transaction state at the beginning of each parallel
* operation, so we can't account for new XIDs at this point.
*/
- if (IsInParallelMode() || IsParallelWorker())
- elog(ERROR, "cannot assign XIDs during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign XIDs in a parallel worker");

I think we don't need these changes at least for the purpose of this
patch if you follow the suggestion related to having a new function
like PrepareParallelMode in the email above [1]. One problem I see
with removing these checks is how do we ensure that leader won't
assign a new transactionid once we start executing a parallel node. It
can do via sub-transactions maybe that is already protected at some
previous point but I would like to see if we can retain these checks.

[1] - https://www.postgresql.org/message-id/CAA4eK1JogfXUa%3D3wMPO%2BK%3DUiOLgHgCO7-fj1wCHsSxdaXsfVbw%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-10-15 12:52:21 Re: pgsql: Restore replication protocol's duplicate command tags
Previous Message Fujii Masao 2020-10-15 10:49:32 Re: New statistics for tuning WAL buffer size