|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Jimmy Yih <jyih(at)vmware(dot)com>|
|Cc:||"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Gaurab Dey <gaurabd(at)vmware(dot)com>|
|Subject:||Re: Concurrent deadlock scenario with DROP INDEX on partitioned index|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Jimmy Yih <jyih(at)vmware(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Actually though, maybe you *don't* want to do this in
>> RangeVarCallbackForDropRelation. Because of point 2, it might be
>> better to run find_all_inheritors after we've successfully
>> identified and locked the direct target relation, ie do it back
>> in RemoveRelations. I've not thought hard about that, but it's
>> attractive if only because it'd mean you don't have to fix point 1.
> We think that RangeVarCallbackForDropRelation is probably the most
> correct function to place the fix. It would look a bit out-of-place
> being in RemoveRelations seeing how there's already relative DROP
> INDEX code in RangeVarCallbackForDropRelation.
I really think you made the wrong choice here. Doing the locking in
RemoveRelations leads to an extremely simple patch, as I demonstrate
below. Moreover, since RemoveRelations also has special-case code
for partitioned indexes, it's hard to argue that it mustn't cover
this case too.
Also, I think the proposed test case isn't very good, because when
I run it without applying the code patch, it fails to demonstrate
any deadlock. The query output is different, but not obviously
> Fixed in attached patch. Added another local variable
> is_partitioned_index to store the classform value. The main reason we
> need the classform is because the existing relkind and
> expected_relkind local variables would only show RELKIND_INDEX whereas
> we needed exactly RELKIND_PARTITIONED_INDEX.
Yeah. As I looked at that I realized that it was totally confusing:
at least one previous author thought that "relkind" stored the rel's
actual relkind, which it doesn't as the code stands. In particular,
in this bit:
if ((relkind == RELKIND_INDEX || relkind == RELKIND_PARTITIONED_INDEX) &&
relOid != oldRelOid)
state->heapOid = IndexGetRelation(relOid, true);
the test for relkind == RELKIND_PARTITIONED_INDEX is dead code
because relkind will never be that. It accidentally works anyway
because the other half of the || does the right thing, but somebody
was confused, and so will readers be.
Hence, I propose the attached. 0001 is pure refactoring: it hopefully
clears up the confusion about which "relkind" is which, and it also
saves a couple of redundant syscache fetches in RemoveRelations.
Then 0002 adds the actual bug fix as well as a test case that does
deadlock with unpatched code.
regards, tom lane
|Next Message||Alvaro Herrera||2022-03-20 18:40:06||Re: support for MERGE|
|Previous Message||Alvaro Herrera||2022-03-20 17:58:07||Re: a misbehavior of partition row movement (?)|