Re: 回复:how to create index concurrently on partitioned table

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: 回复:how to create index concurrently on partitioned table
Date: 2020-08-09 05:00:09
Message-ID: 20200809050009.GA17986@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote:
> That gave me the idea to layer CIC on top of Reindex, since I think it does
> exactly what's needed.

For now, I would recommend to focus first on 0001 to add support for
partitioned tables and indexes to REINDEX. CIC is much more
complicated btw, but I am not entering in the details now.

- /*
- * This may be useful when implemented someday; but that day is not today.
- * For now, avoid erroring out when called in a multi-table context
- * (REINDEX SCHEMA) and happen to come across a partitioned table. The
- * partitions may be reindexed on their own anyway.
- */
+ /* Avoid erroring out */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
This comment does not help, and actually this becomes incorrect as
reindex for this relkind becomes supported once 0001 is done.

+ case RELKIND_INDEX:
+ reindex_index(inhrelid, false, get_rel_persistence(inhrelid),
+ options | REINDEXOPT_REPORT_PROGRESS);
+ break;
+ case RELKIND_RELATION:
+ (void) reindex_relation(inhrelid,
+ REINDEX_REL_PROCESS_TOAST |
+ REINDEX_REL_CHECK_CONSTRAINTS,
+ options | REINDEXOPT_REPORT_PROGRESS);
ReindexPartitionedRel() fails to consider the concurrent case here for
partition indexes and tables, as reindex_index()/reindex_relation()
are the APIs used in the non-concurrent case. Once you consider the
concurrent case correctly, we also need to be careful with partitions
that have a temporary persistency (note that we don't allow partition
trees to mix persistency types, all partitions have to be temporary or
permanent).

I think that you are right to make the entry point to handle
partitioned index in ReindexIndex() and partitioned table in
ReindexTable(), but the structure of the patch should be different:
- The second portion of ReindexMultipleTables() should be moved into a
separate routine, taking in input a list of relation OIDs. This needs
to be extended a bit so as reindex_index() gets called for an index
relkind if the relpersistence is temporary or if we have a
non-concurrent reindex. The idea is that we finish with a single code
path able to work on a list of relations. And your patch adds more of
that as of ReindexPartitionedRel().
- We should *not* handle directly partitioned index and/or table in
ReindexRelationConcurrently() to not complicate the logic where we
gather all the indexes of a table/matview. So I think that the list
of partition indexes/tables to work on should be built directly in
ReindexIndex() and ReindexTable(), and then this should call the
second part of ReindexMultipleTables() refactored in the previous
point. This way, each partition index gets done individually in its
own transaction. For a partition table, all indexes of this partition
are rebuilt in the same set of transactions. For the concurrent case,
we have already reindex_concurrently_swap that it able to switch the
dependencies of two indexes within a partition tree, so we can rely on
that so as a failure in the middle of the operation never leaves the
a partition structure in an inconsistent state.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-09 05:21:49 Re: Allow some recovery parameters to be changed with reload
Previous Message Alexander Korotkov 2020-08-09 01:53:00 Re: LSM tree for Postgres