Re: progress reporting for partitioned REINDEX

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: progress reporting for partitioned REINDEX
Date: 2021-02-17 05:55:04
Message-ID: YCyvuBoo8PMHyf1g@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote:
> These were added to report the index and table that are currently
> being worked on in concurrent reindexes of tables, schemas and
> databases. Before that commit, it would only report up to the last
> index being prepared in phase 1, leaving the user with no info on
> which index is being rebuilt.

Nothing much to add on top of what Matthias is saying here. REINDEX
for partitioned relations builds first the full list of partitions to
work on, and then processes each one of them in a separate
transaction. This is consistent with what we do for other commands
that need to handle an object different than a non-partitioned table
or a non-partitioned index. The progress reporting has to report the
index whose storage is manipulated and its parent table.

> Why pgstat_progress_start_command specifically was chosen? That is
> because there is no method to update the
> beentry->st_progress_command_target other than through
> stat_progress_start_command, and according to the docs that field
> should contain the tableId of the index that is currently being worked
> on. This field needs a pgstat_progress_start_command because CIC / RiC
> reindexes all indexes concurrently at the same time (and not grouped
> by e.g. table), so we must re-start reporting for each index in each
> new phase in which we report data to get the heapId reported correctly
> for that index.

Using pgstat_progress_start_command() for this purpose is fine IMO.
This provides enough information for the user without complicating
more this API layer.

- if (progress)
- pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
- iRel->rd_rel->relam);
+ // Do this unconditionally?
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+ iRel->rd_rel->relam);
You cannot do that, this would clobber the progress information of any
upper layer that already reports something to the progress infra in
the backend's MyProc. CLUSTER is one example calling reindex_relation()
that does *not* want progress reporting to happen in REINDEX.

+ /* progress reporting for partitioned indexes */
+ if (relkind == RELKIND_PARTITIONED_INDEX)
+ {
+ const int progress_index[3] = {
+ PROGRESS_CREATEIDX_COMMAND,
+ PROGRESS_CREATEIDX_INDEX_OID,
+ PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+ };
This does not make sense in ReindexPartitions() IMO because this
relation is not reindexed as it has no storage, and you would lose the
context of each partition.

Something that we may want to study instead is whether we'd like to
report to the user the set of relations a REINDEX command is working
on and on which relation the work is currently done. But I am not
really sure that we need that as long a we have a VERBOSE option that
lets us know via the logs what already happened in a single command.

I see no bug here.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-02-17 06:10:43 Re: progress reporting for partitioned REINDEX
Previous Message John Naylor 2021-02-17 05:40:32 Re: [POC] verifying UTF-8 using SIMD instructions