Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-22 11:49:24
Message-ID: CAJcOf-deGKGsGCwBTpOiwoUxoY+AOAMfSZn9DW2A7NrYrWYCmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 22, 2021 at 7:52 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
>
> (1)
> - * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> - * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
> - * backend writes into a completely new table. In the future, we can
> - * extend it to allow workers to write into the table. However, to allow
> - * parallel updates and deletes, we have to solve other problems,
> - * especially around combo CIDs.)
> + * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> + * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> + * of now, only the leader backend writes into a completely new table. In
>
> This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a completely new table." The reality is that workers as well as the leader can write into an empty or non-empty table in parallel, isn't it?
>

Yes, you're right the wording is not right (and I don't really like
the wording used before the patch).

Perhaps it could say:

(Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
of now, other than in the case of INSERT INTO...SELECT, only the leader backend
writes into a completely new table. In the future, we can extend it to
allow workers for the
other commands to write into the table. However, to allow parallel
updates and deletes, we
have to solve other problems, especially around combo CIDs.)

Of course, this will need further updating when parallel CREATE TABLE
AS etc. is implemented ...

>
> (2)
> /*
> * RELATION_IS_LOCAL
> - * If a rel is either temp or newly created in the current transaction,
> - * it can be assumed to be accessible only to the current backend.
> - * This is typically used to decide that we can skip acquiring locks.
> + * If a rel is temp, it can be assumed to be accessible only to the
> + * current backend. This is typically used to decide that we can
> + * skip acquiring locks.
> *
> * Beware of multiple eval of argument
> */
> #define RELATION_IS_LOCAL(relation) \
> - ((relation)->rd_islocaltemp || \
> - (relation)->rd_createSubid != InvalidSubTransactionId)
> + ((relation)->rd_islocaltemp)
>
> How is this correct? At least, this change would cause a transaction that creates a new relation acquire an unnecessary lock. I'm not sure if that overhead is worth worrying about (perhaps not, I guess). But can we still check >rd_createSubid in non-parallel mode? If we adopt the above change, the comments at call sites need modification - "new or temp relation" becomes "temp relations".
>

The problem is, with the introduction of parallel INSERT, it's no
longer the case that newly-created tables can't be accessed by anyone
else in the same transaction - now, a transaction can include parallel
workers, inserting into the table concurrently. Without changing that
macro, things fail with a very obscure message (e.g. ERROR:
unexpected data beyond EOF in block 5 of relation base/16384/16388)
and it takes days to debug what the cause of it is.
Maybe updating the macro to still check rd_createSubid in non-parallel
mode is a good idea - I'll need to try it.
Other than that, each and every usage of RELATION_IS_LOCAL would need
to be closely examined, to see if it could be within a parallel
INSERT.

>
> (3)
> @@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
> ...
> - pstmt->commandType = CMD_SELECT;
> + Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
> + IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
> + pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;
>
> The last line can just be as follows, according to the Assert():
>
> + pstmt->commandType = estate->es_plannedstmt->commandType);
>

No, that's not right. I did that originally and it failed in some
cases (try changing it and then run the regression tests and you'll
see).
The commandType of the es_plannedstmt might be CMD_INSERT but the one
in the plan might be CMD_SELECT (for the underlying SELECT).

>
> (4)
> @@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
> estate->es_use_parallel_mode = use_parallel_mode;
> if (use_parallel_mode)
> {
> - PrepareParallelMode(estate->es_plannedstmt->commandType);
> + bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState);
> +
> + PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
> EnterParallelMode();
> }
>
> @@ -1021,12 +1039,25 @@ IsInParallelMode(void)
> * Prepare for entering parallel mode, based on command-type.
> */
> void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
> {
> if (IsModifySupportedInParallelMode(commandType))
> {
> Assert(!IsInParallelMode());
>
> + if (isParallelModifyLeader)
> + {
> + /*
> + * Set currentCommandIdUsed to true, to ensure that the current
> + * CommandId (which will be used by the parallel workers) won't
> + * change during this parallel operation, as starting new
> + * commands in parallel-mode is not currently supported.
> + * See related comments in GetCurrentCommandId and
> + * CommandCounterIncrement.
> + */
> + (void) GetCurrentCommandId(true);
> + }
>
> I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan(). PrepareParallelMode() can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel leader or a parallel worker.

You could, but I'm not sure it would make the code easier to read,
especially for those who don't know !isParallelWorker() means it's a
parallel leader.

>
> BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()? Someone who will read other call sites of EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there. Can we just add an argument to EnterParallelMode()? Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to use CMD_XX.
>

I really can't see a problem. PrepareParallelMode() is only needed
prior to execution of a parallel plan, so it's not needed for "other
call sites" using EnterParallelMode().
Perhaps the name can be changed to disassociate it from generic
EnterParallelMode() usage. So far, I've only thought of long names
like: PrepareParallelModePlanExec().
Ideas?

Regards,
Greg Nancarrow
Fujitsu Australia

.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-01-22 11:55:55 Re: Online checksums patch - once again
Previous Message Tang, Haiying 2021-01-22 11:46:28 RE: Parallel Inserts in CREATE TABLE AS