Re: [HACKERS] CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-19 18:47:00
Message-ID: CA+TgmoYueDAgNKmPtV3AWrKCX8rQcnTeQRY7FxfhrVK63dzoeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Attached patch is a rebased document patch. :)

Attached is an updated patch. I went through this patch carefully
today, in the hopes of committing it, and I think the attached version
is pretty closet to being committable, but there's at least one open
issue remaining, as described below.

- The regression tests did not pass because expected/rules.out was not
properly updated. I fixed that.

- The documentation did not build because some tags were not properly
terminated e.g. <xref linkend='...'> rather than <xref
linkend='...'/>. I also fixed that.

- The documentation had two nearly-identical lists of phases. I
merged them into one. There might be room for some further
fine-tuning here.

- cluster_rel() had multiple places where it could return without
calling pgstat_progress_end_command(). I fixed that.

- cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND
for longer than seems necessary. I fixed that.

- copy_heap_data() zeroed out the heap-tuples-scanned,
heap-blocks-scanned, and total-heap-blocks counters when it began
PROGRESS_CLUSTER_PHASE_SORT_TUPLES and
PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES. This seems like throwing away
useful information for no good reason. I changed it not to do that in
all cases except the one mentioned in the next paragraph.

- It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED
because that counter gets reused to indicate the number of heap tuples
*written back out*, but I think that is bad design for two reasons.
First, the documentation does not explain that sometimes the number of
heap tuples scanned is really reporting the number of heap tuples
written. Second, it's bad for columns to have misleading names.
Third, it would actually be really useful to store these values in
separate columns, because then you could expect that the number tuples
written would eventually equal the number scanned, and you'd still
have the number that were scanned around so that you could clearly see
how close you were getting to rewriting the entire heap. This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?

- The comment about reporting that we are now reindexing relations was
jammed in between an existing comment and the associated code. I
moved it to a more logical place.

- The new if-statement in pg_stat_get_progress_info was missing a
space required by project style. I added the space.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
progress_monitor_for_cluster_command_v11.patch application/octet-stream 22.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-19 19:00:32 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Previous Message Tom Lane 2019-03-19 17:46:31 Re: Willing to fix a PQexec() in libpq module