Re: [HACKERS] CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-05 02:35:37
Message-ID: CA+Tgmob4Xjx7mV0mmcX-r9Swf2nyhfTpMDJtgJv06JUYLYmKZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> === Current design ===
>
> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
>
> * Scan method: Seq Scan
> 0. initializing (*2)
> 1. seq scanning heap (*1)
> 3. sorting tuples (*2)
> 4. writing new heap (*1)
> 5. swapping relation files (*2)
> 6. rebuilding index (*2)
> 7. performing final cleanup (*2)
>
> * Scan method: Index Scan
> 0. initializing (*2)
> 2. index scanning heap (*1)
> 5. swapping relation files (*2)
> 6. rebuilding index (*2)
> 7. performing final cleanup (*2)
>
> VACUUM FULL command will proceed in the following sequence of phases:
>
> 1. seq scanning heap (*1)
> 5. swapping relation files (*2)
> 6. rebuilding index (*2)
> 7. performing final cleanup (*2)
>
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column

All of that sounds good.

> The view provides the information of CLUSTER command progress details as follows
> # \d pg_stat_progress_cluster
> View "pg_catalog.pg_stat_progress_cluster"
> Column | Type | Collation | Nullable | Default
> ---------------------------+---------+-----------+----------+---------
> pid | integer | | |
> datid | oid | | |
> datname | name | | |
> relid | oid | | |
> command | text | | |
> phase | text | | |
> cluster_index_relid | bigint | | |
> heap_tuples_scanned | bigint | | |
> heap_tuples_vacuumed | bigint | | |

Still not sure if we need heap_tuples_vacuumed. We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.

> === Discussion points ===
>
> - Progress counter for "3. sorting tuples" phase
> - Should we add pgstat_progress_update_param() in tuplesort.c like a
> "trace_sort"?
> Thanks to Peter Geoghegan for the useful advice!

How would we avoid an abstraction violation?

> - Progress counter for "6. rebuilding index" phase
> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
> However, I'm not sure whether it is okay or not.

Doesn't seem unreasonable to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-03-05 02:41:35 Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
Previous Message Amit Kapila 2019-03-05 02:28:44 Re: Inheriting table AMs for partitioned tables