From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race between SELECT and ALTER TABLE NO INHERIT
Date: 2017-09-19 06:04:30
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <d88a9a64-e307-59b5-b4c3-8d7fc11cb59d(at)lab(dot)ntt(dot)co(dot)jp>
> Hi.
> On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> > << the following is another topic >>
> >
> >>>> BTW, in the partitioned table case, the parent is always locked first
> >>>> using an AccessExclusiveLock. There are other considerations in that case
> >>>> such as needing to recreate the partition descriptor upon termination of
> >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >>>
> >>> Apart from the degree of concurrency, if we keep parent->children
> >>> order of locking, such recreation does not seem to be
> >>> needed. Maybe I'm missing something.
> >>
> >> Sorry to have introduced that topic in this thread, but I will try to
> >> explain anyway why things are the way they are currently:
> >>
> >> Once a table is no longer a partition of the parent (detached or dropped),
> >> we must make sure that the next commands in the transaction don't see it
> >> as one. That information is currently present in the relcache
> >> (rd_partdesc), which is used by a few callers, most notably the
> >> tuple-routing code. Next commands must recreate the entry so that the
> >> correct thing happens based on the updated information. More precisely,
> >> we must invalidate the current entry. RelationClearRelation() will either
> >> delete the entry or rebuild it. If it's being referenced somewhere, it
> >> will be rebuilt. The place holding the reference may also be looking at
> >> the content of rd_partdesc, which we don't require them to make a copy of,
> >> so we must preserve its content while other fields of RelationData are
> >> being read anew from the catalog. We don't have to preserve it if there
> >> has been any change (partition added/dropped), but to make such a change
> >> one would need to take a strong enough lock on the relation (parent). We
> >> assume here that anyone who wants to reference rd_partdesc takes at least
> >> AccessShareLock lock on the relation, and anyone who wants to change its
> >> content must take a lock that will conflict with it, so
> >> AccessExclusiveLock. Note that in all of this, we are only talking about
> >> one relation, that is the parent, so parent -> child ordering of taking
> >> locks may be irrelevant.
> >
> > I think I understand this, anyway DropInherit and DropPartition
> > is different-but-at-the-same-level operations so surely needs
> > amendment for drop/detach cases. Is there already a solution? Or
> > reproducing steps?
> Sorry, I think I forgot to reply to this. Since you seem to have chosen
> the other solution (checking that child is still a child), maybe this
> reply is a bit too late, but anyway.

I choosed it at that time for the reason mentioned upthread, but
haven't decided which is better.

> DropInherit or NO INHERIT is seen primarily as changing a child table's
> (which is the target table of the command) property that it is no longer a
> child of the parent, so we lock the child table to block concurrent
> operations from considering it a child of parent anymore. The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.
> DropPartition or DETACH PARTITION is seen primarily as changing the parent
> table's (which is the target table of the command) property that one of
> the partitions is removed, so we lock the parent. Any concurrent
> operations that rely on the parent's relcache to get the partition list
> will wait for the session that is dropping the partition to finish, so
> that they get the fresh information from the relcache (or more importantly
> do not end up with information obtained from the relcache going invalid
> under them without notice). Note that the lock on the partition/child is
> also present and it plays more or less the the same role as it does in the
> DropInherit case, but due to different order of locking, reported race
> condition does not occur between SELECT on partitioned table and

Thank you for the explanation. I understand that the difference
comes from which of parent and children has the information about
inheritance/partitioning. DROP child and ALTER child NO INHERITS
are (I think) the only two operations that intiated from children
side. The parent-locking patch results in different solutions for
similar problems.

However, parent locking on DROPped child requires a new index
InheritsRelidIndex to avoid full scan on pg_inherits, but the NO
INHERITS case doesn't. This might be a good reason for the

I noticed that DROP TABLE partition acquires lock on the parent
in a callback for RangeVarGetRelid(Extended). The same way seems
reasonable for the NO INHERIT case. As the result the patch
becomes far small and less invasive than the previous
one. (dropinh_lock_parent_v2.patch)

As a negative PoC, attacched dropchild_lock_parent_PoC.patch only
to show how the same method on DROP TABLE case looks.

> By the way, I will take a look at your patch when I come back from the
> vacation. Meanwhile, I noticed that it needs another rebase after
> 0a480502b092 [1].

Great. Thanks.

> Thanks,
> Amit
> [1]


Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
dropinh_lock_parent_v2.patch text/x-patch 1.1 KB
dropchild_lock_parent_PoC.patch text/x-patch 2.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-09-19 06:08:43 Re: [PATCH] Improve geometric types
Previous Message Masahiko Sawada 2017-09-19 05:48:36 Re: Creating backup history files for backups taken from standbys