RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: RE: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-07-06 01:42:20
Message-ID: OS0PR01MB5716EC1D07ACCA24373C2557941B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sunday, July 4, 2021 1:44 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Fri, Jul 2, 2021 at 8:16 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422(at)gmail(dot)com>
> wrote:
> > > I personally think "(b) provide an option to the user to specify
> > > whether inserts can be parallelized on a relation" is the preferable
> > > option.
> > > There seems to be too many issues with the alternative of trying to
> > > determine the parallel-safety of a partitioned table automatically.
> > > I think (b) is the simplest and most consistent approach, working
> > > the same way for all table types, and without the overhead of (a).
> > > Also, I don't think (b) is difficult for the user. At worst, the
> > > user can use the provided utility-functions at development-time to
> > > verify the intended declared table parallel-safety.
> > > I can't really see some mixture of (a) and (b) being acceptable.
> >
> > Yeah, I'd like to have it be automatic, but I don't have a clear idea
> > how to make that work nicely. It's possible somebody (Tom?) can
> > suggest something that I'm overlooking, though.
>
> In general, for the non-partitioned table, where we don't have much overhead
> of checking the parallel safety and invalidation is also not a big problem so I am
> tempted to provide an automatic parallel safety check. This would enable
> parallelism for more cases wherever it is suitable without user intervention.
> OTOH, I understand that providing automatic checking might be very costly if
> the number of partitions is more. Can't we provide some mid-way where the
> parallelism is enabled by default for the normal table but for the partitioned
> table it is disabled by default and the user has to set it safe for enabling
> parallelism? I agree that such behavior might sound a bit hackish.

About the invalidation for non-partitioned table, I think it still has a
problem: When a function's parallel safety changed, it's expensive to judge
whether the function is related to index or trigger or some table-related
objects by using pg_depend, because we can only do the judgement in each
backend when accept a invalidation message. If we don't do that, it means
whenever a function's parallel safety changed, we invalidate every relation's
cached safety which looks not very nice to me.

So, I personally think "(b) provide an option to the user to specify whether
inserts can be parallelized on a relation" is the preferable option.

Best regards,
houzj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-06 01:43:06 Re: Atomic rename feature for Windows.
Previous Message David Rowley 2021-07-06 01:09:56 Re: Evaluate expression at planning time for two more cases