I noticed that a deadlock can occur due to the way locking when dropping a
partition proceeds. Steps to reproduce:
1. Attach debugger to two sessions, one of which will do a select on the
partitioned parent and the other will drop one of its partitions.
2. In the first debugging session, set a breakpoint at the start of
expand_inherited_rtentry() which is the first point in a select query's
processing where individual partitions will be locked (the parent will
have already been locked by the rewriter).
3. In the second session, set a breakpoint at the start of
heap_drop_with_catalog(), which is the first point in the drop command's
processing where the parent will be locked (the partition will have
already been locked by RangeVarGetRelidExtended()). This will wait for
the first session to release the lock on the parent.
4. In the first session, proceeding with locking of the partition will
cause it wait for the second session that is holding a lock on it; a
deadlock is detected, because that session is waiting for us to release
the lock on the parent.
Attached is a patch to fix that. In the original partitioning patch, I
had aped the approach of index_drop() where the parent heap relation is
locked along with the index relation so that the parent's cached list of
indexes can be invalidated. But I failed to also ape what
RangeVarCallbackForDropRelation() does when dropping an index, which is to
lock the parent heap relation before locking the index relation at all.
For dropping a partition case, it means we lock the parent before we lock
the partition relation.
Will add this to open items list.
pgsql-hackers by date
|Next:||From: Ashutosh Bapat||Date: 2017-04-03 06:49:30|
|Subject: Re: Foreign tables don't enforce the partition constraint|
|Previous:||From: Ashutosh Bapat||Date: 2017-04-03 06:44:09|
|Subject: Unable to build doc on latest head|