Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-25 13:58:00
Message-ID: CA+HiwqH5p892hkEh87=AycBwuAW00wCmw6U4LfdKT2QizVOCzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 24, 2021 at 6:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Feb 24, 2021 at 2:14 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > > > > because of parallel-safety checks on the partitions, which found
> > > > > > attributes of the partitions which weren't parallel-safe.
> > > > > > So it's not so clear to me that the dependency doesn't exist - the
> > > > > > non-parallel plan does in fact depend on the state of the partitions.
> > > > >
> > > > > Hmm, I think that is not what we can consider as a dependency.
> > > >
> > > > Then if it's not a dependency, then we shouldn't have to check the
> > > > attributes of the partitions for parallel-safety, to determine whether
> > > > we must use a non-parallel plan (or can use a parallel plan).
> > > > Except, of course, we do have to ...
> > >
> > > I don't think the plan-dependency and checking for parallel-safety are
> > > directly related.
> >
> > That is certainly not my understanding. Why do you think that they are
> > not directly related?
> > This whole issue came about because Amit L pointed out that there is a
> > need to add partition OIDs as plan-dependencies BECAUSE the checking
> > for parallel-safety and plan-dependency are related - since now, for
> > Parallel INSERT, we're executing extra parallel-safety checks that
> > check partition properties, so the resultant plan is dependent on the
> > partitions and their properties.
>
> He has pointed out an issue when the plan is parallel and you can see
> in that example that it fails if we didn't invalidate it. For
> non-parallel plans, there won't be any such issue.

Yes. I checked around a bit (code and -hackers archive [1]) and came
away with the impression that there do not appear to be any set rules
for deciding which object changes to send an invalidation message for
(sending side: ddl, vacuum/analyze) and which items of a Query or a
Plan to track changes for (receiving side: planner, plancache). One
could say the foremost rule is to avoid broken cached plans and only
in some really obvious cases do the thing that produces a better plan
[2]. While no compromises can be made for the former, whether or not
to go for the latter probably involves some cost-benefit analysis,
something we can probably revisit.

I don't think we're compromising by not adding the partition OIDs when
the insert plan is not parallel, but the benefits of adding them in
all cases are not so clear cut that maybe it's not worth it.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] Plan invalidation design:
https://www.postgresql.org/message-id/flat/20244.1171734513%40sss.pgh.pa.us

[2] ...contradicts what I said before, but I found this comment in
DefineIndex():

/*
* The pg_index update will cause backends (including this one) to update
* relcache entries for the index itself, but we should also send a
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
* to replan; so relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);

So this invalidates any plans referencing the index's parent relation
to trigger replanning so as to take the index into account. The old
plans would not really be "unrunnable" without the index though.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-02-25 14:35:06 Re: archive_command / pg_stat_archiver & documentation
Previous Message Masahiko Sawada 2021-02-25 13:42:10 Re: 64-bit XIDs in deleted nbtree pages