From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race between SELECT and ALTER TABLE NO INHERIT
Date: 2017-09-15 06:36:26
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> << the following is another topic >>
>>>> BTW, in the partitioned table case, the parent is always locked first
>>>> using an AccessExclusiveLock. There are other considerations in that case
>>>> such as needing to recreate the partition descriptor upon termination of
>>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
>>> Apart from the degree of concurrency, if we keep parent->children
>>> order of locking, such recreation does not seem to be
>>> needed. Maybe I'm missing something.
>> Sorry to have introduced that topic in this thread, but I will try to
>> explain anyway why things are the way they are currently:
>> Once a table is no longer a partition of the parent (detached or dropped),
>> we must make sure that the next commands in the transaction don't see it
>> as one. That information is currently present in the relcache
>> (rd_partdesc), which is used by a few callers, most notably the
>> tuple-routing code. Next commands must recreate the entry so that the
>> correct thing happens based on the updated information. More precisely,
>> we must invalidate the current entry. RelationClearRelation() will either
>> delete the entry or rebuild it. If it's being referenced somewhere, it
>> will be rebuilt. The place holding the reference may also be looking at
>> the content of rd_partdesc, which we don't require them to make a copy of,
>> so we must preserve its content while other fields of RelationData are
>> being read anew from the catalog. We don't have to preserve it if there
>> has been any change (partition added/dropped), but to make such a change
>> one would need to take a strong enough lock on the relation (parent). We
>> assume here that anyone who wants to reference rd_partdesc takes at least
>> AccessShareLock lock on the relation, and anyone who wants to change its
>> content must take a lock that will conflict with it, so
>> AccessExclusiveLock. Note that in all of this, we are only talking about
>> one relation, that is the parent, so parent -> child ordering of taking
>> locks may be irrelevant.
> I think I understand this, anyway DropInherit and DropPartition
> is different-but-at-the-same-level operations so surely needs
> amendment for drop/detach cases. Is there already a solution? Or
> reproducing steps?

Sorry, I think I forgot to reply to this. Since you seem to have chosen
the other solution (checking that child is still a child), maybe this
reply is a bit too late, but anyway.

DropInherit or NO INHERIT is seen primarily as changing a child table's
(which is the target table of the command) property that it is no longer a
child of the parent, so we lock the child table to block concurrent
operations from considering it a child of parent anymore. The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.

DropPartition or DETACH PARTITION is seen primarily as changing the parent
table's (which is the target table of the command) property that one of
the partitions is removed, so we lock the parent. Any concurrent
operations that rely on the parent's relcache to get the partition list
will wait for the session that is dropping the partition to finish, so
that they get the fresh information from the relcache (or more importantly
do not end up with information obtained from the relcache going invalid
under them without notice). Note that the lock on the partition/child is
also present and it plays more or less the the same role as it does in the
DropInherit case, but due to different order of locking, reported race
condition does not occur between SELECT on partitioned table and

By the way, I will take a look at your patch when I come back from the
vacation. Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].



In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-09-15 06:41:33 Re: Race between SELECT and ALTER TABLE NO INHERIT
Previous Message Douglas Doole 2017-09-15 06:33:01 Re: Add Roman numeral conversion to to_number