Quick Links

Re: Parallel INSERT (INTO ... SELECT ...)

From:	Greg Nancarrow <gregn4422(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Amit Langote <amitlangote09(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject:	Re: Parallel INSERT (INTO ... SELECT ...)
Date:	2021-02-24 14:11:36
Message-ID:	CAJcOf-dsX+hnLaK19cAT2MjkVz0oS-eR=qused0dr2-r3vtepw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 25, 2021 at 12:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Feb 24, 2021 at 6:21 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > Thanks, I'll try it.
> > > > I did, however, notice a few concerns with your suggested alternative fix:
> > > > - It is not restricted to INSERT (as current fix is).
> > > >
> > >
> > > So what? The Select also has a similar problem.
> > >
> >
> > Yes, but you're potentially now adding overhead to every
> > SELECT/UPDATE/DELETE with a subquery, by the added recursive checking
> > and walking done by the new call to max_parallel_hazard_walker().and
> > code block looking for a modifying CTE
> >
>
> Can you please share an example where it has added an overhead?
>
> > And anyway I'm not sure it's really right putting in a fix for SELECT
> > with a modifying CTE, into a patch that adds parallel INSERT
> > functionality - in any case you'd need to really spell this out in
> > code comments, as this is at best a temporary fix that would need to
> > be removed whenever the query rewriter is fixed to set hasModifyingCTE
> > correctly.
> >
> > > > - It does not set parse->hasModifyingCTE (as current fix does), so the
> > > > return value (PlannedStmt) from standard_planner() won't have
> > > > hasModifyingCTE set correctly in the cases where the rewriter doesn't
> > > > set it correctly (and I'm not sure what strange side effects ??? that
> > > > might have).
> > >
> > > Here end goal is not to set hasModifyingCTE but do let me know if you
> > > see any problem or impact.
> > >
> >
> > parse->hasModifyingCTE is not just used in the shortcut-test for
> > parallel-safety, its value is subsequently copied into the PlannedStmt
> > returned by standard_planner.
> > It's inconsistent to leave hasModifyingCTE FALSE when by the fix it
> > has found a modifying CTE.
> > Even if no existing tests detect an issue with this, PlannedStmt is
> > left with a bad hasModifyingCTE value in this case, so there is the
> > potential for something to go wrong.
> >
> > > > - Although the invocation of max_parallel_hazard_walker() on a RTE
> > > > subquery will "work" in finally locating your fix's added
> > > > "CommonTableExpr" parallel-safety disabling block for commandType !=
> > > > CMD_SELECT, it looks like it potentially results in checking and
> > > > walking over a lot of other stuff within the subquery not related to
> > > > CTEs. The current fix does a more specific and efficient search for a
> > > > modifying CTE.
> > > >
> > >
> > > I find the current fix proposed by you quite ad-hoc and don't think we
> > > can go that way.
> > >
> >
> > At least my current fix is very specific, efficient and clear in its
> > purpose, and suitably documented, so it is very clear when and how it
> > is to be removed, when the issue is fixed in the query rewriter.
> > Another concern with the alternative fix is that it always searches
> > for a modifying CTE, even when parse->hasModifyingCTE is true after
> > the query rewriter processing.
> >
>
> There is a check in standard_planner such that if
> parse->hasModifyingCTE is true then we won't try checking
> parallel-safety.
>

OK, I retract that last concern, parallel-safety checks are skipped
when parse->hasModifyingCTE is true.
Examples of overhead will need to wait until tomorrow (and would need
to test), but seems fairly clear max_parallel_hazard_walker() first
checks parallel-unsafe functions in the node, then does numerous
node-type checks before getting to CommonTableExpr - exactly how much
extra work would depend on the SQL.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Re: Parallel INSERT (INTO ... SELECT ...) at 2021-02-24 13:19:46 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	mariakatosvich	2021-02-24 14:33:53	Re: REINDEX backend filtering
Previous Message	Julien Rouhaud	2021-02-24 13:52:06	Re: archive_command / pg_stat_archiver & documentation