Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2019-01-31 18:02:35
Message-ID: CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Jan 29, 2019 at 1:59 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't know how to reduce this to something reliable enough to
> include it in the regression tests, and maybe we don't really need
> that, but it's good to know that this is not a purely theoretical
> problem. I think next I'll try to write some code to make
> execPartition.c able to cope with the situation when it arises.

OK, that seems to be pretty easy. New patch series attached. The
patch with that new logic is 0004. I've consolidated some of the
things I had as separate patches in my last post and rewritten the
commit messages to explain more clearly the purpose of each patch.

Open issues:

- For now, I haven't tried to handle the DETACH PARTITION case. I
don't think there's anything preventing someone - possibly even me -
from implementing the counter-based approach that I described in the
previous message, but I think it would be good to have some more
discussion first on whether it's acceptable to make concurrent queries
error out. I think any queries that were already up and running would
be fine, but any that were planned before the DETACH and tried to
execute afterwards would get an ERROR. That's fairly low-probability,
because normally the shared invalidation machinery would cause
replanning, but there's a race condition, so we'd have to document
something like: if you use this feature, it'll probably just work, but
you might get some funny errors in other sessions if you're unlucky.
That kinda sucks but maybe we should just suck it up. Possibly we
should consider making the concurrent behavior optional, so that if
you'd rather take blocking locks than risk errors, you have that
option. Of course I guess you could also just let people do an
explicit LOCK TABLE if that's what they want. Or we could try to
actually make it work in that case, I guess by ignoring the detached
partitions, but that seems a lot harder.

- 0003 doesn't have any handling for parallel query at this point, so
even though within a single backend a single query execution will
always get the same PartitionDesc for the same relation, the answers
might not be consistent across the parallel group. I keep going back
and forth on whether this really matters. It's too late to modify the
plan, so any relations attached since it was generated are not going
to get scanned. As for detached relations, we're talking about making
them error out, so we don't have to worry about different backends
come to different conclusions about whether they should be scanned.
But maybe we should try to be smarter instead. One concern is that
even relations that aren't scanned could still be affected because of
tuple routing, but right now parallel queries can't INSERT or UPDATE
rows anyway. Then again, maybe we should try not to add more
obstacles in the way of lifting that restriction. Then again again,
AFAICT we wouldn't be able to test that the new code is actually
solving a problem right now today, and how much untested code do we
really want in the tree? And then on the eleventh hand, maybe there
are other reasons why it's important to use the same PartitionDesc
across all parallel workers that I'm not thinking about at the moment.

- 0003 also changes the order in which locks are acquired. I am not
sure whether we care about this, especially in view of other pending
changes.

If you know of other problems, have solutions to or opinions about
these, or think the whole approach is wrong, please speak up!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0002-Ensure-that-RelationBuildPartitionDesc-sees-a-consis.patch application/octet-stream 6.4 KB
0001-Move-code-for-managing-PartitionDescs-into-a-new-fil.patch application/octet-stream 23.0 KB
0005-Reduce-the-lock-level-required-to-attach-a-partition.patch application/octet-stream 814 bytes
0004-Teach-runtime-partition-pruning-to-cope-with-concurr.patch application/octet-stream 8.3 KB
0003-Ensure-that-repeated-PartitionDesc-lookups-return-th.patch application/octet-stream 18.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arne Roland 2019-01-31 18:19:54 RE: dsa_allocate() faliure
Previous Message Tom Lane 2019-01-31 17:04:51 Re: Fast path for empty relids in check_outerjoin_delay()