Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-11-08 02:01:28
Message-ID: CA+TgmoZGJsy-nRFnzurhZQJtHdDh5fzJKvbvhS0byN6_46pB9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 7, 2018 at 7:06 PM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> While the find_all_inheritors() call is something I'd like to see
> gone, I assume it was done that way since an UPDATE might route a
> tuple to a partition that there is no subplan for and due to INSERT
> with VALUES not having any RangeTblEntry for any of the partitions.
> Simply, any partition which is a descendant of the target partition
> table could receive the tuple regardless of what might have been
> pruned.

Thanks. I had figured out since my email of earlier today that it was
needed in the INSERT case, but I had not thought of/discovered the
case of an UPDATE that routes a tuple to a pruned partition. I think
that latter case may not be tested in our regression tests, which is
perhaps something we ought to change.

Honestly, I *think* that the reason that find_all_inheritors() call is
there is because I had the idea that it was important to try to lock
partition hierarchies in the same order in all cases so as to avoid
spurious deadlocks. However, I don't think we're really achieving
that goal despite this code. If we arrive at this point having
already locked some relations, and then lock some more, based on
whatever got pruned, we're clearly not using a deterministic locking
order. So I think we could probably rip out the find_all_inheritors()
call here and change the NoLock in get_partition_dispatch_recurse() to
just take a lock. That's probably a worthwhile simplification and a
slight optimization regardless of anything else.

But I really think it would be better if we could also jigger this to
avoid reopening relations which the executor has already opened and
locked elsewhere. Unfortunately, I don't see a really simple way to
accomplish that. We get the OIDs of the descendents and want to know
whether there is range table entry for that OID; but there's no data
structure which answers that question at present, I believe, and
introducing one just for this purpose seems like an awful lot of new
machinery. Perhaps that new machinery would still have less
far-reaching consequences than the machinery Alvaro is proposing, but,
still, it's not very appealing.

Perhaps one idea is only open and lock partitions on demand - i.e. if
a tuple actually gets routed to them. There are good reasons to do
that independently of reducing lock levels, and we certainly couldn't
do it without having some efficient way to check whether it had
already been done. So then the mechanism wouldn't feel like so much
like a special-purpose hack just for concurrent ATTACH/DETACH. (Was
Amit Langote already working on this, or was that some other kind of
on-demand locking?)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-11-08 02:10:00 Re: ATTACH/DETACH PARTITION CONCURRENTLY
Previous Message Tomas Vondra 2018-11-08 01:57:46 Re: jsonpath