Re: [HACKERS] CLUSTER command progress monitor

From: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-01 04:53:55
Message-ID: 76d84ded-c81c-98b1-6fa4-4551ff295008@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019/02/23 6:02, Robert Haas wrote:
> On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> This patch is rebased on HEAD.
>> I'll tackle revising the patch based on feedbacks next month.
>
> + Running <command>VACUUM FULL</command> is listed in
> <structname>pg_stat_progress_cluster</structname>
> + view because it uses <command>CLUSTER</command> command
> internally. See <xref linkend='cluster-progress-reporting'>.
>
> It's not really true to say that VACUUM FULL uses the CLUSTER command
> internally. It's not really true. It uses a good chunk of the same
> infrastructure, but it certainly doesn't use the actual command, and
> it's not really the exact same thing either, because internally it's
> doing a sequential scan but no sort, which never happens with CLUSTER.
> I'm not sure exactly how to rephrase this, but I think we need to make
> it more precise.
>
> One idea is that maybe we should try to think of a design that could
> also handle the rewriting variants of ALTER TABLE, and call it
> pg_stat_progress_rewrite. Maybe that's moving the goalposts too far,
> but I'm not saying we'd necessarily have to do all the work now, just
> have a view that we think could also handle that. Then again, maybe
> the needs are too different.
>

Hmm..., I see.
If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of
the implementation.
For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that
it's tough work.

> + Whenever <command>CLUSTER</command> is running, the
> + <structname>pg_stat_progress_cluster</structname> view will contain
> + one row for each backend that is currently clustering or vacuuming
> (VACUUM FULL).
>
> That sentence contradicts itself. Just say that it contains a row for
> each backend that is currently running CLUSTER or VACUUM FULL.
>

Fixed.

> @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
> void
> cluster(ClusterStmt *stmt, bool isTopLevel)
> {
> +
> if (stmt->relation != NULL)
> {
> /* This is the single-relation case. */
>
> Useless hunk.
>

Fixed.

> @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
> heap_close(rel, NoLock);
>
> /* Do the job. */
> + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
> cluster_rel(tableOid, indexOid, stmt->options);
> + pgstat_progress_end_command();
> }
> else
> {
>
> It seems like that stuff should be inside cluster_rel().
>

Fixed.

> + /* Set reltuples to total_tuples */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
> OldHeap->rd_rel->reltuples);
>
> I object. If the user wants that, they can get it from pg_class
> themselves via an SQL query. It's also an estimate, not something we
> know to be accurate; I want us to only report facts here, not theories

I understand that progress monitor should only report facts, so I
removed that code.

> + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
> PROGRESS_CLUSTER_METHOD_INDEX_SCAN);
>
> I think you should use pgstat_progress_update_multi_param() if
> updating multiple parameters at the same time.
>
> Also, some lines in this patch, such as this one, are very long.
> Consider techniques to reduce the line length to 80 characters or
> less, such as inserting a line break between the two arguments.

Fixed.

> + /* Set scan_method to NULL */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);
>
> NULL and -1 are not the same thing.

Oops, fixed.

> I think that we shouldn't have both
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just
> use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet,
> let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
> PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
> PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably
> simpler.

Fixed.

> I agree that it's acceptable to report
> PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
> PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
> understand why it's valuable to do so in the context of a progress
> indicator.

Actually, I'm not sure why I added it since so much time has passed. :(
So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least.

Attached patch is wip patch.

Thanks!
Tatsuro Yamada

Attachment Content-Type Size
progress_monitor_for_cluster_command_v6.patch text/x-patch 21.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-03-01 05:16:40 Re: [HACKERS] Block level parallel vacuum
Previous Message Tom Lane 2019-03-01 04:50:16 Re: pg_partition_tree crashes for a non-defined relation