Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-23 11:53:05
Message-ID: CAA4eK1+Rt1qMVPUOHw42VND-dtoCBmX0_Xv3uDc_L3Ke0ATN8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 23, 2021 at 4:47 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > > > > partitions if the final plan won't use parallelism. For that, the
> > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > > > > > > trick though I don't know if that's the correct way of doing that.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > > > > > > using parallelism, because we may be relying on the result of
> > > > > > > > parallel-safety checks on partitions in both cases.
> > > > > > > > For example, insert_parallel.sql currently includes a test (that you
> > > > > > > > originally provided in a previous post) that checks a non-parallel
> > > > > > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > > > > > partition involved in the INSERT.
> > > > > > > > If I further add to that test by then dropping that trigger and then
> > > > > > > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > > > > > > generates a non-parallel plan. So I think the "&&
> > > > > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > > > > >
> > > > > > > Ah, okay, I didn't retest my case after making that change.
> > > > > > >
> > > > > >
> > > > > > Greg has point here but I feel something on previous lines (having a
> > > > > > test of glob->parallelModeNeeded) is better. We only want to
> > > > > > invalidate the plan if the prepared plan is unsafe to execute next
> > > > > > time. It is quite possible that there are unsafe triggers on different
> > > > > > partitions and only one of them is dropped, so next time planning will
> > > > > > again yield to the same non-parallel plan. If we agree with that I
> > > > > > think it is better to add this dependency in set_plan_refs (along with
> > > > > > Gather node handling).
> > > > > >
> > > > >
> > > > > I think we should try to be consistent with current plan-cache
> > > > > functionality, rather than possibly inventing new rules for
> > > > > partitions.
> > > > > I'm finding that with current Postgres functionality (master branch),
> > > > > if there are two parallel-unsafe triggers defined on a normal table
> > > > > and one is dropped, it results in replanning and it yields the same
> > > > > (non-parallel) plan.
> > > > >
> > > >
> > > > Does such a plan have partitions access in it? Can you share the plan?
> > > >
> > >
> > > Er, no (it's just a regular table), but that was exactly my point:
> > > aren't you suggesting functionality for partitions that doesn't seem
> > > to work the same way for non-partitions?
> > >
> >
> > I don't think so. The non-parallel plan for Insert doesn't directly
> > depend on partitions so we don't need to invalidate those.
> >
>
> But the non-parallel plan was chosen (instead of a parallel plan)
> because of parallel-safety checks on the partitions, which found
> attributes of the partitions which weren't parallel-safe.
> So it's not so clear to me that the dependency doesn't exist - the
> non-parallel plan does in fact depend on the state of the partitions.
>

Hmm, I think that is not what we can consider as a dependency.

> I know you're suggesting to reduce plan-cache invalidation by only
> recording a dependency in the parallel-plan case, but I've yet to see
> that in the existing code, and in fact it seems to be inconsistent
> with the behaviour I've tested so far (one example given prior, but
> will look for more).
>

I don't see your example matches what you are saying as in it the
regular table exists in the plan whereas for the case we are
discussing partitions doesn't exist in the plan. Amit L. seems to have
given a correct explanation [1] of why we don't need to invalidate for
non-parallel plans which match my understanding.

[1] - https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message George MacKerron 2021-02-23 11:56:03 INSERT ... ON CONFLICT ... : expose INSERT vs UPDATE status
Previous Message Greg Nancarrow 2021-02-23 11:17:11 Re: Parallel INSERT (INTO ... SELECT ...)