Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-03-05 13:14:11
Message-ID: CAA4eK1+TYPN-LXLa2T1hj3ZYabDFQO8b-m8Jid0JA85T8UF97w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Fri, Mar 5, 2021 at 9:35 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > >
> >
> > In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are
> > introducing GUC (enable_parallel_dml) and table option
> > (parallel_dml_enabled) for this feature. I am a bit worried about
> > using *_dml in the names because it is quite possible that for
> > parallel updates and parallel deletes we might not need any such GUC.
> > The reason we mainly need here is due to checking of parallel-safety
> > of partitioned tables and updates/deletes handle partitioned tables
> > differently than inserts so those might not be that costly. It is
> > possible that they are costly due to a different reason but not sure
> > mapping those to one GUC or table option is a good idea. Can we
> > consider using *_insert instead? I think GUC having _insert can be
> > probably used for a parallel copy (from) as well which I think will
> > have a similar overhead.
> >
>
> I'll need to think about that one.
>

Okay, one more minor comment on the same patch:
+ /*
+ * Check if parallel_dml_enabled is enabled for the target table,
+ * if not, skip the safety checks.
+ *
+ * (Note: if the target table is partitioned, the parallel_dml_enabled
+ * option setting of the partitions are ignored).
+ */
+ rte = rt_fetch(parse->resultRelation, parse->rtable);
+ rel = table_open(rte->relid, NoLock);

I think here the patch is using NoLock based on the assumption that
the relation is already locked in parse/analyze phase, so it's better
to have a comment like the one we have in
target_rel_max_parallel_hazard.

> I may be wrong, but I would have thought at least updates would have
> similar parallel-safety checking requirements to inserts and would
> have similar potential cost issues.
>

For updates and deletes, we go via inheritance planner where we
already do some work related to partitions, so I think we might not
need to open all the partitions for parallel-safety checks as we do
for inserts. I am sure some part of the code for inserts like the one
we have max_parallel_hazard will be used for updates as well but there
will be probably some change for checking parallel-safety for
partitioned relations.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2021-03-05 13:16:35 Re: Which PG version does CVE-2021-20229 affected?
Previous Message Greg Nancarrow 2021-03-05 13:04:36 Re: Parallel INSERT (INTO ... SELECT ...)