Re: progress reporting for partitioned REINDEX

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: progress reporting for partitioned REINDEX
Date: 2021-02-18 05:17:00
Message-ID: YC34TIOQ1zQTl0Xd@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 17, 2021 at 10:24:37AM -0600, Justin Pryzby wrote:
> When we implemented REINDEX of partitioned tables, it should've handled
> progress reporting in the fields where that's reported for CREATE INDEX.
> Or else we should document that "partitions_total/done are not populated for
> REINDEX of a partitioned table as they are for CREATE INDEX".

CREATE INDEX and REINDEX are two completely separate commands, with
separate code paths, and mostly separate logics. When it comes to
REINDEX, the information that is currently showed to the user is not
incorrect, but in line with what the progress reporting of ~13 is able
to do: each index gets reported with its parent table one-by-one,
depending on if CONCURRENTLY is used or not, in consistency with what
ReindexMultipleTables() does for all the REINDEX commands working on
multiple objects, processing in one transaction each object listed
previously.

Now, coming back to the ask, I think that if we want to provide some
information in the REINDEX with the list of relations to work on, we
are going to need more fields than what we have now, to report:
1) The total number of indexes on which REINDEX is working on for the
current relation worked on.
2) The n-th index being worked on by REINDEX, as of the number of
indexes in 1).
3) The total number of relations a given command is working on, aka
the number of tables REINDEX SCHEMA, DATABASE, SYSTEM or REINDEX on a
partitioned relation has accumulated.
4) The n-th relation listed in 3) currently worked on.

The current columns partitions_total and partitions_done are partially
able to fill in the roles of 3) and 4), if we'd rename those columns
to relations_done and relations_total, still they could also mean 1)
and 2) in some contexts, like the number of indexes worked on for a
single relation. So the problem is more complex than you make it
sound, and needs to consider a certain number of cases to be
consistent across all the REINDEX commands that exist. In short, this
is not only a problem related to partitioned tables.

I have no issues with documenting more precisely on which commands
partitions_total and partitions_done apply currently, by citing the
commands where these are effective. We do that for index_relid for
instance.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-02-18 05:35:06 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Langote 2021-02-18 04:51:31 Re: POC: postgres_fdw insert batching