Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 03:44:12
Message-ID: CAJcOf-dEYwef0J5N+bdXQDwzTxWLU-josfxbNEUW43kQ-5u42g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2021 at 7:39 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> PlannerGlobal *glob = root->glob;
> int rtoffset = list_length(glob->finalrtable);
> ListCell *lc;
> + Plan *finalPlan;
>
> /*
> * Add all the query's RTEs to the flattened rangetable. The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> }
>
> /* Now fix the Plan tree */
> - return set_plan_refs(root, plan, rtoffset);
> + finalPlan = set_plan_refs(root, plan, rtoffset);
> + if (finalPlan != NULL && IsA(finalPlan, Gather))
> + {
> + Plan *subplan = outerPlan(finalPlan);
> +
> + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> + {
> + finalPlan->targetlist = copyObject(subplan->targetlist);
> + }
> + }
> + return finalPlan;
> }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>
> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
> /*
> * Set up the visible plan targetlist as being the same as
> * the first RETURNING list. This is for the use of
> * EXPLAIN; the executor won't pay any attention to the
> * targetlist. We postpone this step until here so that
> * we don't have to do set_returning_clause_references()
> * twice on identical targetlists.
> */
> splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
> /*
> * Initialize result tuple slot and assign its rowtype using the first
> * RETURNING list. We assume the rest will look the same.
> */
> mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist. However I don't guarantee that this is the best approach - some
> planner expert should speak up.
>

I've had a bit closer look at this particular issue.
I can see what you mean about the ModifyTable targetlist (that is set
in set_plan_refs()) getting overwritten by ExecInitModifyTable() -
which also contradicts the comment in set_plan_refs() that claims the
targetlist being set is used by EXPLAIN (which it is not). So the
current Postgres master branch does seem to be broken in that respect.

I did try your suggestion (and also remove my tweak), but I found that
in the T_Gather case of set_plan_refs() it does some processing (see
set_upper_references()) of the current Gather targetlist and subplan's
targetlist (and will then overwrite the Gather targetlist after that),
but in doing that processing it produces the error:
ERROR: variable not found in subplan target list
I think that one of the fundamental problems is that, up to now,
ModifyTable has always been the top node in a (non-parallel) plan, but
now for Parallel INSERT we have a Gather node with ModifyTable in its
subplan. So the expected order of processing and node configuration
has changed.
For the moment (until perhaps a planner expert DOES speak up) I've
parked my temporary "fix" at the bottom of set_plan_refs(), simply
pointing the Gather node's targetlist to subplan's ModifyTable
targetlist.

if (nodeTag(plan) == T_Gather)
{
Plan *subplan = plan->lefttree;

if (IsA(subplan, ModifyTable) &&
castNode(ModifyTable, subplan)->returningLists != NIL)
{
plan->targetlist = subplan->targetlist;
}
}

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-02-10 04:00:53 Re: [POC] verifying UTF-8 using SIMD instructions
Previous Message Amit Langote 2021-02-10 03:38:53 Re: Parallel INSERT (INTO ... SELECT ...)