Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-11-07 16:05:07
Message-ID: CA+TgmoZACOKeUf_ve730EEkzsBv3Zxk-Wu0HVdyo1jBkGDkeKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 6, 2018 at 5:09 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> - get_partition_dispatch_recurse and ExecCreatePartitionPruneState
> both call RelationGetPartitionDesc. Presumably, this means that if
> the partition descriptor gets updated on the fly, the tuple routing
> and partition dispatch code could end up with different ideas about
> which partitions exist. I think this should be fixed somehow, so that
> we only call RelationGetPartitionDesc once per query and use the
> result for everything.

I think there is deeper trouble here.
ExecSetupPartitionTupleRouting() calls find_all_inheritors() to
acquire RowExclusiveLock on the whole partitioning hierarchy. It then
calls RelationGetPartitionDispatchInfo (as a non-relcache function,
this seems poorly named) which calls get_partition_dispatch_recurse,
which does this:

/*
* We assume all tables in the partition tree were already locked
* by the caller.
*/
Relation partrel = heap_open(partrelid, NoLock);

That seems OK at present, because no new partitions can have appeared
since ExecSetupPartitionTupleRouting() acquired locks. But if we
allow new partitions to be added with only ShareUpdateExclusiveLock,
then I think there would be a problem. If a new partition OID creeps
into the partition descriptor after find_all_inheritors() and before
we fetch its partition descriptor, then we wouldn't have previously
taken a lock on it and would still be attempting to open it without a
lock, which is bad (cf. b04aeb0a053e7cf7faad89f7d47844d8ba0dc839).

Admittedly, it might be a bit hard to provoke a failure here because
I'm not exactly sure how you could trigger a relcache reload in the
critical window, but I don't think we should rely on that.

More generally, it seems a bit strange that we take the approach of
locking the entire partitioning hierarchy here regardless of which
relations the query actually knows about. If some relations have been
pruned, presumably we don't need to lock them; if/when we permit
concurrent partition, we don't need to lock any new ones that have
materialized. We're just going to end up ignoring them anyway because
there's nothing to do with the information that they are or are not
excluded from the query when they don't appear in the query plan in
the first place.

Furthermore, this whole thing looks suspiciously like more of the sort
of redundant locking that f2343653f5b2aecfc759f36dbb3fd2a61f36853e
attempted to eliminate. In light of that commit message, I'm
wondering whether the best approach would be to [1] get rid of the
find_all_inheritors call altogether and [2] somehow ensure that
get_partition_dispatch_recurse() doesn't open any tables that aren't
part of the query's range table.

Thoughts? Comments? Ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Raiskup 2018-11-07 16:06:01 Re: plruby: rb_iterate symbol clash with libruby.so
Previous Message Tom Lane 2018-11-07 15:55:13 Re: plruby: rb_iterate symbol clash with libruby.so