Re: [HACKERS] CLUSTER command progress monitor

From: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-05 08:56:39
Message-ID: e73dc8f7-1bb8-efb2-0c20-45f1b7d995d9@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert!

On 2019/03/05 11:35, Robert Haas wrote:
> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> === Current design ===
>>
>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>> Depending on which one is chosen, the command will proceed in the
>> following sequence of phases:
>>
>> * Scan method: Seq Scan
>> 0. initializing (*2)
>> 1. seq scanning heap (*1)
>> 3. sorting tuples (*2)
>> 4. writing new heap (*1)
>> 5. swapping relation files (*2)
>> 6. rebuilding index (*2)
>> 7. performing final cleanup (*2)
>>
>> * Scan method: Index Scan
>> 0. initializing (*2)
>> 2. index scanning heap (*1)
>> 5. swapping relation files (*2)
>> 6. rebuilding index (*2)
>> 7. performing final cleanup (*2)
>>
>> VACUUM FULL command will proceed in the following sequence of phases:
>>
>> 1. seq scanning heap (*1)
>> 5. swapping relation files (*2)
>> 6. rebuilding index (*2)
>> 7. performing final cleanup (*2)
>>
>> (*1): increasing the value in heap_tuples_scanned column
>> (*2): only shows the phase in the phase column
>
> All of that sounds good.
>
>> The view provides the information of CLUSTER command progress details as follows
>> # \d pg_stat_progress_cluster
>> View "pg_catalog.pg_stat_progress_cluster"
>> Column | Type | Collation | Nullable | Default
>> ---------------------------+---------+-----------+----------+---------
>> pid | integer | | |
>> datid | oid | | |
>> datname | name | | |
>> relid | oid | | |
>> command | text | | |
>> phase | text | | |
>> cluster_index_relid | bigint | | |
>> heap_tuples_scanned | bigint | | |
>> heap_tuples_vacuumed | bigint | | |
>
> Still not sure if we need heap_tuples_vacuumed. We could try to
> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> we're using a Seq Scan.

I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
copy_heap_data()
heap_beginscan()
heap_beginscan_internal()
initscan()

>> === Discussion points ===
>>
>> - Progress counter for "3. sorting tuples" phase
>> - Should we add pgstat_progress_update_param() in tuplesort.c like a
>> "trace_sort"?
>> Thanks to Peter Geoghegan for the useful advice!
>
> How would we avoid an abstraction violation?

Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.

>> - Progress counter for "6. rebuilding index" phase
>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>> However, I'm not sure whether it is okay or not.
>
> Doesn't seem unreasonable to me.

I see, I'll add it later.

Regards,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PG Bug reporting form 2019-03-05 09:05:55 BUG #15668: Server crash in transformPartitionRangeBounds
Previous Message Iwata, Aya 2019-03-05 08:55:23 RE: libpq debug log