Re: CLUSTER on partitioned index

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: CLUSTER on partitioned index
Date: 2021-09-23 23:56:26
Message-ID: 20210923235626.GJ831@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:
> On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> > > I have to wonder if there really *is* a use case for CLUSTER in the
> > > first place on regular tables, let alone on partitioned tables, which
> > > are likely to be large and thus take a lot of time.
> >
> > The cluster now is done one partition at a time, so it might take a long time,
> > but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
> > (since v14) REINDEX.
>
> Note: The following review is based on the assumption that this v11
> revision was meant to contain only one patch. I put this up as a note,
> because it seemed quite limited when compared to earlier versions of
> the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

> I noticed that you store the result of find_all_inheritors(...,
> NoLock) in get_tables_to_cluster_partitioned, without taking care of
> potential concurrent partition hierarchy changes that the comment on
> find_all_inheritors warns against or documenting why it is safe, which
> sounds dangerous in case someone wants to adapt the code. One problem
> I can think of is that only storing reloid and indoid is not
> necessarily safe, as they might be reused by drop+create table running
> parallel to the CLUSTER command.

The parallel code in vacuum is expand_vacuum_rel(), which is where the
corresponding things happens for vacuum full. This patch is to make cluster()
do all the same stuff before calling cluster_rel().

What VACUUM tries to do is to avoid erroring if a partition is dropped while
cluster is running. cluster_rel() does the same thing by calling
cluster_multiple_rels() ,which uses CLUOPT_RECHECK.

If the OIDs wrapped around, I think existing vacuum could accidentally process
a new table with the same OID as a dropped partition. I think cluster would
*normally* catch that case and error in check_index_is_clusterable():
| Check that index is in fact an index on the given relation

Arguably VACUUM FULL could call cluster() (not cluster_rel()) and pass the
partitioned table rather than first expanding it. But non-full vacuum needs to
expand partitioned tables anyway.

> Apart from that, I think it would be useful (though not strictly
> necessary for this patch) if you could adapt the current CLUSTER
> progress reporting to report the progress for the whole partition
> hierarchy, instead of a new progress report for each relation, as was
> the case in earlier versions of the patchset.

Yea, but this is already true for VACUUM FULL (which uses CLUSTER and supports
partitioned tables since v10) and REINDEX.
See also https://postgr.es/m/20210216064214.GI28165@telsasoft.com

My goal is to present a minimal patch and avoid any nonessential complexity.

> The v11 patch seems quite incomplete when compared to previous
> discussions, or at the very least is very limited (no ALTER TABLE
> clustering commands for partitioned tables, but `CLUSTER ptable USING
> pindex` is supported). If v11 is the new proposed direction for ptable
> clustering, could you also document these limitations in the
> cluster.sgml and alter_table.sgml docs?

You said it's less complete, but it's is due to deliberate reduction in scope.
cluster.sgml says:
+ Clustering a partitioned table clusters each of its partitions using the
+ partition of the specified partitioned index (which must be specified).

The ALTER restriction hasn't changed, so I didn't touch the documentation.

I am still curious myself to know if this is the direction the patch should
move.

> > [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]
>
> > diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
> > ...
> > +ALTER TABLE clstrpart SET WITHOUT CLUSTER;
> > +ERROR: cannot mark index clustered in partitioned table
>
> This error message does not seem to match my expectation as a user: I
> am not trying to mark an index as clustered, and for a normal table
> "SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
> that behaviour of normal unclustered tables should be shared here as
> well. At the very least, the error message should be changed.

This is the pre-existing behavior.

> > -DROP TABLE clstrpart;
>
> I believe that this cleanup should not be fully removed, but moved to
> before '-- Test CLUSTER with external tuplesorting', as the table is
> not used after that line.

You're right - this was from when the patchset handled CLUSTER ON.
Leaving the index allows testing in pg_dump - a large part of the complexity of
the elided patches is to handle restoring a partitioned index, without
violating the rule that partitions of an clustered index must also be
clustered. I adjusted this in my local branch.

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-09-24 00:52:09 Re: Spelling change in LLVM 14 API
Previous Message Robert Haas 2021-09-23 23:43:00 Re: Gather performance analysis