From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race between SELECT and ALTER TABLE NO INHERIT
Date: 2017-08-28 09:28:07
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I'll add this to CF2017-09.

At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <75fe42df-b1d8-89ff-596d-d9da0749e66d(at)lab(dot)ntt(dot)co(dot)jp>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> >
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms so I'd like to fix that this time. How can we move on?
> Agreed that this should be fixed.
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> >
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
> Agreed. BTW, the hash table in patch #1 does not seem to be really
> helpful. In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true? AFAICS, it does not.
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.

So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a

This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)

<< the following is another topic >>

> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock. There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one. That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code. Next commands must recreate the entry so that the
> correct thing happens based on the updated information. More precisely,
> we must invalidate the current entry. RelationClearRelation() will either
> delete the entry or rebuild it. If it's being referenced somewhere, it
> will be rebuilt. The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog. We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent). We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock. Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.

I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?


Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
ignore_no_longer_child.patch text/x-patch 7.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-08-28 09:28:31 Re: psql --batch
Previous Message Kyotaro HORIGUCHI 2017-08-28 09:26:09 Re: show "aggressive" or not in autovacuum logs