Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Date: 2021-04-22 09:56:25
Message-ID: CA+HiwqFgpP1LxJZOBYGt9rpvTjXXkg5qG2+Xch2Z1Q7KrqZR1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Sorry about being away from this for over a week.)

On Thu, Apr 22, 2021 at 5:39 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> While the approach in the previous email does pass the tests, I think
> (but couldn't find a test case to prove) it does so coincidentally, not
> because it is correct. If I make the test for "detached exist" use the
> cached omits-partitions-partdesc, it does fail, because we had
> previously cached one that was not yet omitting the partition. So what
> I said earlier in the thread stands: the set of partitions that are
> considered detached changes depending on what the active snapshot is,
> and therefore we *must not* cache any such descriptor.
>
> So I backtracked to my previous proposal, which saves in relcache only
> the partdesc that includes all partitions. If any partdesc is built
> that omits partitions being detached, that one must be rebuilt afresh
> each time. And to avoid potentially saving a lot of single-use
> partdescs in CacheMemoryContext, in the attached second patch (which I
> attach separately only to make it more obvious to review) I store such
> partdescs in PortalContext.
>
> Barring objections, I will get this pushed early tomorrow.

Thanks for updating the patch. I have mostly cosmetic comments.

Reading through the latest one, seeing both include_detached and
omit_detached being used in different parts of the code makes it a bit
hard to keep in mind what a given code path is doing wrt detached
partitions. How about making it all omit_detached?

* Cope with partitions concurrently being detached. When we see a
- * partition marked "detach pending", we only include it in the set of
- * visible partitions if caller requested all detached partitions, or
- * if its pg_inherits tuple's xmin is still visible to the active
- * snapshot.
+ * partition marked "detach pending", we omit it from the returned
+ * descriptor if caller requested that and the tuple's xmin does not
+ * appear in progress to the active snapshot.

It seems odd for a comment in find_inheritance_children() to talk
about the "descriptor". Maybe the earlier "set of visible
partitions" wording was fine?

- * The reason for this check is that we want to avoid seeing the
+ * The reason for this hack is that we want to avoid seeing the
* partition as alive in RI queries during REPEATABLE READ or
<snip>
+ * SERIALIZABLE transactions.

The comment doesn't quite make it clear why it is the RI query case
that necessitates this hack in the first case. Maybe the relation to
what's going on with the partdesc

+ if (likely(rel->rd_partdesc &&
+ (!rel->rd_partdesc->detached_exist || include_detached)))
+ return rel->rd_partdesc;

I think it would help to have a comment about what's going here, in
addition to the description you already wrote for
PartitionDescData.detached_exist. Maybe something along the lines of:

===
Under normal circumstances, we just return the partdesc that was
already built. However, if the partdesc was built at a time when
there existed detach-pending partition(s), which the current caller
would rather not see (omit_detached), then we build one afresh
omitting any such partitions and return that one.
RelationBuildPartitionDesc() makes sure that any such partdescs will
disappear when the query finishes.
===

That's maybe a bit verbose but I am sure you will find a way to write
it more succinctly.

BTW, I do feel a bit alarmed by the potential performance impact of
this. If detached_exist of a cached partdesc is true, then RI queries
invoked during a bulk DML operation would have to rebuild one for
every tuple to be checked, right? I haven't tried an actual example
yet though.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-04-22 10:49:23 Re: INT64_FORMAT in translatable strings
Previous Message Amit Kapila 2021-04-22 09:33:22 Re: Truncate in synchronous logical replication failed