Re: [HACKERS] CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-02-22 21:02:34
Message-ID: CA+TgmoYOjPGyxJ2QfjiTwQ+j_XG0awpu70Cc4uP2m=CE186bKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> This patch is rebased on HEAD.
> I'll tackle revising the patch based on feedbacks next month.

+ Running <command>VACUUM FULL</command> is listed in
<structname>pg_stat_progress_cluster</structname>
+ view because it uses <command>CLUSTER</command> command
internally. See <xref linkend='cluster-progress-reporting'>.

It's not really true to say that VACUUM FULL uses the CLUSTER command
internally. It's not really true. It uses a good chunk of the same
infrastructure, but it certainly doesn't use the actual command, and
it's not really the exact same thing either, because internally it's
doing a sequential scan but no sort, which never happens with CLUSTER.
I'm not sure exactly how to rephrase this, but I think we need to make
it more precise.

One idea is that maybe we should try to think of a design that could
also handle the rewriting variants of ALTER TABLE, and call it
pg_stat_progress_rewrite. Maybe that's moving the goalposts too far,
but I'm not saying we'd necessarily have to do all the work now, just
have a view that we think could also handle that. Then again, maybe
the needs are too different.

+ Whenever <command>CLUSTER</command> is running, the
+ <structname>pg_stat_progress_cluster</structname> view will contain
+ one row for each backend that is currently clustering or vacuuming
(VACUUM FULL).

That sentence contradicts itself. Just say that it contains a row for
each backend that is currently running CLUSTER or VACUUM FULL.

@@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
void
cluster(ClusterStmt *stmt, bool isTopLevel)
{
+
if (stmt->relation != NULL)
{
/* This is the single-relation case. */

Useless hunk.

@@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
heap_close(rel, NoLock);

/* Do the job. */
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
cluster_rel(tableOid, indexOid, stmt->options);
+ pgstat_progress_end_command();
}
else
{

It seems like that stuff should be inside cluster_rel().

+ /* Set reltuples to total_tuples */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
OldHeap->rd_rel->reltuples);

I object. If the user wants that, they can get it from pg_class
themselves via an SQL query. It's also an estimate, not something we
know to be accurate; I want us to only report facts here, not theories

+ pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
PROGRESS_CLUSTER_METHOD_INDEX_SCAN);

I think you should use pgstat_progress_update_multi_param() if
updating multiple parameters at the same time.

Also, some lines in this patch, such as this one, are very long.
Consider techniques to reduce the line length to 80 characters or
less, such as inserting a line break between the two arguments.

+ /* Set scan_method to NULL */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);

NULL and -1 are not the same thing.

I think that we shouldn't have both
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just
use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet,
let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably
simpler.

I agree that it's acceptable to report
PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
understand why it's valuable to do so in the context of a progress
indicator.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-02-22 21:08:14 Re: CTE Changes in PostgreSQL 12, can we have a GUC to get old behavior
Previous Message Andres Freund 2019-02-22 20:39:45 Re: CTE Changes in PostgreSQL 12, can we have a GUC to get old behavior