CLUSTER on partitioned index

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: CLUSTER on partitioned index
Date: 2020-10-28 00:33:12
Message-ID: 20201028003312.GU9241@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Forking this thread, since the existing CFs have been closed.
https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd

On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:
> > Honestly, I think you're over-thinking and over-engineering indisclustered.
> >
> > If "clusteredness" was something we offered to maintain across DML, I think
> > that might be important to provide stronger guarantees. As it is now, I don't
> > think this patch is worth changing the catalog definition.
>
> Well, this use case is new because we are discussing the relationship
> of indisclustered across multiple transactions for multiple indexes,
> so I'd rather have this discussion than not, and I have learnt
> the hard way with REINDEX that we should care a lot about the
> consistency of partition trees at any step of the operation.

indisclustered is only used as a default for "CLUSTER" (without USING). The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.

> Let's
> imagine a simple example here, take this partition tree: p (parent),
> and two partitions p1 and p2. p has two partitioned indexes i and j,
> indexes also present in p1 and p2 as i1, i2, j1 and j2. Let's assume
> that the user has done a CLUSTER on p USING i that completes, meaning
> that i, i1 and i2 have indisclustered set. Now let's assume that the
> user does a CLUSTER on p USING j this time, and that this command
> fails while processing p2, meaning that indisclustered is set for j1,
> i2, and perhaps i or j depending on what the patch does.

I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.

--
Justin

Attachment Content-Type Size
v1-0001-Implement-CLUSTER-of-partitioned-table.patch text/x-diff 16.8 KB
v1-0002-preserve-indisclustered-on-children-of-clustered-.patch text/x-diff 2.9 KB
v1-0003-Invalidate-indisclustered-when-attaching-uncluste.patch text/x-diff 8.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-10-28 00:44:32 DROP INDEX CONCURRENTLY on partitioned index
Previous Message Tomas Vondra 2020-10-27 23:56:23 Re: Resetting spilled txn statistics in pg_stat_replication