Quick Links

RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject:	RE: [bug?] Missed parallel safety checks, and wrong parallel safety
Date:	2021-06-16 12:40:25
Message-ID:	OS0PR01MB5716202362B69D15F2BFA342940F9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Okay, but I think if we go with your suggested model where whenever
> > there is a change in parallel-safety of any function, we need to send
> > the new invalidation then I think it won't matter whether the function
> > is associated with index expression, check constraint in the table, or
> > is used in any other way.
>
> No, it will still matter, because I'm proposing that the parallel-safety of
> functions that we only access via operator classes will not even be checked.
> Also, if we decided to make the system more fine-grained - e.g. by invalidating
> on the specific OID of the function that was changed rather than doing
> something that is database-wide or global - then it matters even more.
>
> > Yeah, dealing with partitioned tables is tricky. I think if we don't
> > want to check upfront the parallel safety of all the partitions then
> > the other option as discussed could be to ask the user to specify the
> > parallel safety of partitioned tables.
>
> Just to be clear here, I don't think it really matters what we *want* to do. I don't
> think it's reasonably *possible* to check all the partitions, because we don't
> hold locks on them. When we're assessing a bunch of stuff related to an
> individual relation, we have a lock on it. I think - though we should
> double-check tablecmds.c - that this is enough to prevent all of the dependent
> objects - triggers, constraints, etc. - from changing. So the stuff we care about
> is stable. But the situation with a partitioned table is different. In that case, we
> can't even examine that stuff without locking all the partitions. And even if we
> do lock all the partitions, the stuff could change immediately afterward and we
> wouldn't know. So I think it would be difficult to make it correct.
>
> Now, maybe it could be done, and I think that's worth a little more thought. For
> example, perhaps whenever we invalidate a relation, we could also somehow
> send some new, special kind of invalidation for its parent saying, essentially,
> "hey, one of your children has changed in a way you might care about." But
> that's not good enough, because it only goes up one level. The grandparent
> would still be unaware that a change it potentially cares about has occurred
> someplace down in the partitioning hierarchy. That seems hard to patch up,
> again because of the locking rules. The child can know the OID of its parent
> without locking the parent, but it can't know the OID of its grandparent without
> locking its parent. Walking up the whole partitioning hierarchy might be an
> issue for a number of reasons, including possible deadlocks, and possible race
> conditions where we don't emit all of the right invalidations in the face of
> concurrent changes. So I don't quite see a way around this part of the problem,
> but I may well be missing something. In fact I hope I am missing something,
> because solving this problem would be really nice.

For partition, I think postgres already have the logic about recursively finding
the parent table[1]. Can we copy that logic to send serval invalid messages that
invalidate the parent and grandparent... relcache if change a partition's parallel safety ?
Although, it means we need more lock(on its parents) when the parallel safety
changed, but it seems it's not a frequent scenario and looks acceptable.

[1] In generate_partition_qual()
parentrelid = get_partition_parent(RelationGetRelid(rel), true);
parent = relation_open(parentrelid, AccessShareLock);
...
/* Add the parent's quals to the list (if any) */
if (parent->rd_rel->relispartition)
result = list_concat(generate_partition_qual(parent), my_qual);

Besides, I have a possible crazy idea that maybe it's not necessary to invalidate the
relcache when parallel safety of function is changed.

I take a look at what postgres currently behaves, and found that even if user changes
a function (CREATE OR REPLACE/ALTER FUNCTION) which is used in
objects(like: constraint or index expression or partition key expression),
the data in the relation won't be rechecked. And as the doc said[2], It is *not recommended*
to change the function which is already used in some other objects. The
recommended way to handle such a change is to drop the object, adjust the function
definition, and re-add the objects. Maybe we only care about the parallel safety
change when create or drop an object(constraint or index or partition or trigger). And
we can check the parallel safety when insert into a particular table, if find functions
not allowed in parallel mode which means someone change the function's parallel safety,
then we can invalidate the relcache and error out.

[2]https://www.postgresql.org/docs/14/ddl-constraints.html

Best regards,
houzj

In response to

Re: [bug?] Missed parallel safety checks, and wrong parallel safety at 2021-06-15 14:00:49 from Robert Haas

Responses

Re: [bug?] Missed parallel safety checks, and wrong parallel safety at 2021-06-24 03:10:25 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2021-06-16 12:44:29	Re: unnesting multirange data types
Previous Message	Alexander Pyhalov	2021-06-16 12:36:13	Re: postgres_fdw batching vs. (re)creating the tuple slots