Re: Race between SELECT and ALTER TABLE NO INHERIT

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race between SELECT and ALTER TABLE NO INHERIT
Date: 2017-06-27 07:27:18
Message-ID: 75fe42df-b1d8-89ff-596d-d9da0749e66d@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
>>
>> I recall I had proposed a fix for the same thing some time ago [1].
>
> Wow. About 1.5 years ago and stuck? I have a concrete case where
> this harms so I'd like to fix that this time. How can we move on?

Agreed that this should be fixed.

Your proposed approach #1 to recheck the inheritance after obtaining the
lock on the child table seems to be a good way forward.

Approach #2 of reordering locking is a simpler solution, but does not
completely prevent the problem, because DROP TABLE child can still cause
it to occur, as you mentioned.

>>> The cause is that NO INHERIT doesn't take an exlusive lock on the
>>> parent. This allows expand_inherited_rtentry to add the child
>>> relation into appendrel after removal from the inheritance but
>>> still exists.
>>
>> Right.
>>
>>> I see two ways to fix this.
>>>
>>> The first patch adds a recheck of inheritance relationship if the
>>> corresponding attribute is missing in the child in
>>> make_inh_translation_list(). The recheck is a bit complex but it
>>> is not performed unless the sequence above is happen. It checks
>>> duplication of relid (or cycles in inheritance) following
>>> find_all_inheritors (but doing a bit different) but I'm not sure
>>> it is really useful.
>>
>> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
>> I guess your hash table based solution will do the job as far as
>> performance of this check is concerned, although I haven't checked the
>> code closely.
>
> The hash table is not crucial in the patch. Substantially it just
> recurs down looking up pg_inherits for the child. I considered
> the new index but abandoned because I thought that such case
> won't occur so frequently.

Agreed. BTW, the hash table in patch #1 does not seem to be really
helpful. In the while loop in is_descendant_of_internal(), does
hash_search() ever return found = true? AFAICS, it does not.

>>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
>>> the parent first.
>>>
>>> Since the latter has a larger impact on the current behavior and
>>> we already treat "DROP TABLE child" case in the similar way, I
>>> suppose that the first approach would be preferable.
>>
>> That makes sense.
>>
>> BTW, in the partitioned table case, the parent is always locked first
>> using an AccessExclusiveLock. There are other considerations in that case
>> such as needing to recreate the partition descriptor upon termination of
>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
>
> Apart from the degree of concurrency, if we keep parent->children
> order of locking, such recreation does not seem to be
> needed. Maybe I'm missing something.

Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:

Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one. That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code. Next commands must recreate the entry so that the
correct thing happens based on the updated information. More precisely,
we must invalidate the current entry. RelationClearRelation() will either
delete the entry or rebuild it. If it's being referenced somewhere, it
will be rebuilt. The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog. We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent). We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock. Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-06-27 07:56:32 Re: Setting pd_lower in GIN metapage
Previous Message Pavel Stehule 2017-06-27 07:06:22 Re: psql - add special variable to reflect the last query status