Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-24 04:11:51
Message-ID: CAA4eK1JgK-zFu0tv2kTUaQ56mkV_SWA2fk2AJ103_bb0bSgYBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > because of parallel-safety checks on the partitions, which found
> > > attributes of the partitions which weren't parallel-safe.
> > > So it's not so clear to me that the dependency doesn't exist - the
> > > non-parallel plan does in fact depend on the state of the partitions.
> > >
> >
> > Hmm, I think that is not what we can consider as a dependency.
> >
>
> Then if it's not a dependency, then we shouldn't have to check the
> attributes of the partitions for parallel-safety, to determine whether
> we must use a non-parallel plan (or can use a parallel plan).
> Except, of course, we do have to ...
>

I don't think the plan-dependency and checking for parallel-safety are
directly related.

> > > I know you're suggesting to reduce plan-cache invalidation by only
> > > recording a dependency in the parallel-plan case, but I've yet to see
> > > that in the existing code, and in fact it seems to be inconsistent
> > > with the behaviour I've tested so far (one example given prior, but
> > > will look for more).
> > >
> >
> > I don't see your example matches what you are saying as in it the
> > regular table exists in the plan whereas for the case we are
> > discussing partitions doesn't exist in the plan. Amit L. seems to have
> > given a correct explanation [1] of why we don't need to invalidate for
> > non-parallel plans which match my understanding.
> >
> > [1] - https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com
> >
>
> Amit L's explanation was:
>
> I may be wrong but it doesn't seem to me that the possibility of
> constructing a better plan due to a given change is enough reason for
> plancache.c to invalidate plans that depend on that change. AIUI,
> plancache.c only considers a change interesting if it would *break* a
> Query or a plan.
>
> So in this case, a non-parallel plan may be slower, but it isn't
> exactly rendered *wrong* by changes that make a parallel plan
> possible.
>
>
> This explanation doesn't seem to match existing planner behavior
> AFAICS, and we should try to be consistent with existing behavior
> (rather than doing something different, for partitions specifically in
> this case).
>

I still think it matches. You have missed the important point in your
example and explanation which Amit L and I am trying to explain to
you. See below.

> Using a concrete example, to avoid any confusion, consider the
> following SQL (using unpatched Postgres, master branch):
>
>
> -- encourage use of parallel plans, where possible
> set parallel_setup_cost=0;
> set parallel_tuple_cost=0;
> set min_parallel_table_scan_size=0;
> set max_parallel_workers_per_gather=4;
>
> create or replace function myfunc() returns boolean as $$
> begin
> return true;
> end;
> $$ language plpgsql parallel unsafe immutable;
>
> create table mytable(x int);
> insert into mytable(x) values(generate_series(1,10));
>
> prepare q as select * from mytable, myfunc();
> explain execute q;
>
> -- change myfunc to be parallel-safe, to see the effect
> -- on the planner for the same select
> create or replace function myfunc() returns boolean as $$
> begin
> return true;
> end;
> $$ language plpgsql parallel safe immutable;
>
> explain execute q;
>
>
> Here a function referenced in the SELECT statement is changed from
> parallel-unsafe to parallel-safe, to see the effect on plancache.
> According to your referenced explanation, that shouldn't be considered
> an "interesting" change by plancache.c, as it wouldn't "break" the
> previously planned and cached (non-parallel) query - the existing
> non-parallel plan could and should be used, as it still would execute
> OK, even if slower. Right?
>

No that is not what I said or maybe you have misunderstood. In your
example, if you use the verbose option, then you will see the output
as below.

postgres=# explain (verbose) execute q;
QUERY PLAN
------------------------------------------------------------------
Seq Scan on public.mytable (cost=0.00..35.50 rows=2550 width=5)
Output: mytable.x, true
(2 rows)

Here, you can see that Plan depends on function (it's return value),
so it needs to invalidated when the function changes. To, see it in a
bit different way, if you change your function as below then, you can
clearly see FunctionScan in the plan:

create or replace function myfunc(c1 int) returns int as $$
begin
return c1 * 10;
end;
$$ language plpgsql parallel unsafe;

postgres=# prepare q as select * from mytable, myfunc(2);
PREPARE
postgres=# explain select * from mytable, myfunc(2);
QUERY PLAN
-----------------------------------------------------------------
Nested Loop (cost=0.25..61.26 rows=2550 width=8)
-> Function Scan on myfunc (cost=0.25..0.26 rows=1 width=4)
-> Seq Scan on mytable (cost=0.00..35.50 rows=2550 width=4)
(3 rows)

> BUT - for the above example, it DOES cause the query to be replanned,
> and it then uses a parallel plan, instead of keeping the original
> non-parallel plan.
> This does not match the explanation about how plancache works.
>

As explained above and as far as I can understand it matches Amit L's
explanation because the plan depends on the function output.

>
>
> So what I am saying is why should this behavior be different for our
> partition case?
> Doing something different for partitions would be inconsistent with
> this behavior, would it not?
>

No, please show an example of the Insert case where the plan has
reference to partitions. If that happens then we should add those as a
dependency but AFAICT that is not the case here.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2021-02-24 05:48:20 Re: repeated decoding of prepared transactions
Previous Message k.jamison@fujitsu.com 2021-02-24 04:04:04 RE: libpq debug log