Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-01 12:41:41
Message-ID: CAJcOf-e_tzVzF+kp7HDM+UMNrfM+DUL0Ck3boOdp-DDygt5Jww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> + foreach(lcSub, rte->subquery->rtable)
> + {
> + rteSub = lfirst_node(RangeTblEntry, lcSub);
> + if (rteSub->rtekind == RTE_RELATION)
> + {
> + hasSubQueryOnRelation = true;
> + break;
> + }
> + }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
> Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
> In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>

Yes, the current checks are too simple, as you point out, there seem
to be more complex cases that it doesn't pick up. Unfortunately
expanding the testing for them does detract from the original
intention of this code (which was to avoid extra parallel-safety check
processing on code which can't be run in parallel). I guess the
relation walker function should additionally check for SELECT queries
only (commandType == CMD_SELECT), and exclude SELECT FOR UPDATE/SHARE
(rowMarks != NIL) too. I'll need to look further into it, but will
certainly update the code for the next version of the patch.

> 2).
>
> +--
> +-- Test INSERT into temporary table with underlying query.
> +-- (should not use a parallel plan)
> +--
>
> May be the comment here need some change since
> we currently support parallel plan for temp table.
>

Thanks, it should say something like "should create the plan with
INSERT + parallel SELECT".

> 3)
> Do you think we can add a testcase for foreign-table ?
> To test parallel query with serial insert on foreign table.
>

I have intended to do it, but as a lower-priority task.

>
> [1]
> static bool
> relation_walker(Node *node)
> {
> if (node == NULL)
> return false;
>
> else if (IsA(node, RangeTblEntry))
> {
> RangeTblEntry *rte = (RangeTblEntry *) node;
> if (rte->rtekind == RTE_RELATION)
> return true;
>
> return false;
> }
>
> else if (IsA(node, Query))
> {
> Query *query = (Query *) node;
>
> /* Recurse into subselects */
> return query_tree_walker(query, relation_walker,
> NULL, QTW_EXAMINE_RTES_BEFORE);
> }
>
> /* Recurse to check arguments */
> return expression_tree_walker(node,
> relation_walker,
> NULL);
> }
>

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2021-02-01 12:41:59 Re: pgsql: Implementation of subscripting for jsonb
Previous Message Amit Kapila 2021-02-01 12:26:01 Re: Single transaction in the tablesync worker?