Re: CLUSTER command progress monitor

From: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER command progress monitor
Date: 2017-09-12 12:57:32
Message-ID: 59B7D9BC.6090300@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/12 21:20, Tatsuro Yamada wrote:
> On 2017/09/11 23:38, Robert Haas wrote:
>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Thanks for the comment.
>>>
>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>> heap and write new heap" when INDEX SCAN was selected.
>>>
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless? I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort processing.
> Is this the right way?
>
>
>>>> The patch is getting the value reported as heap_tuples_total from
>>>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can
>>>> see that value anyway if they wish. The point of the progress
>>>> counters is to expose things the user couldn't otherwise see. It's
>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>> and may be way off if the relation has recently be extended by a large
>>>> amount. I think it's pretty important that we try hard to only report
>>>> values that are known to be accurate, because users hate (and mock)
>>>> inaccurate progress reports.
>>>
>>> Do you mean to use the number of rows by using below calculation instead
>>> OldHeap->rd_rel->reltuples?
>>>
>>> estimate rows = physical table size / average row length
>>
>> No, I mean don't report it at all. The caller can do that calculation
>> if they wish, without any help from the progress reporting machinery.
>
> I see. I'll remove that column on next patch.

I will summarize the current design and future corrections before sending
the next patch.

=== Current design ===

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

* Scan method: Seq Scan
1. scanning heap (*1)
2. sorting tuples (*2)
3. writing new heap (*1)
5. swapping relation files (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

* Scan method: Index Scan
4. scan heap and write new heap (*1)
5. swapping relation files (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

VACUUM FULL command will proceed in the following sequence of phases:

1. scanning heap (*1)
5. swapping relation files (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column

The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
View "pg_catalog.pg_stat_progress_cluster"
Column | Type | Collation | Nullable | Default
---------------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
command | text | | |
phase | text | | |
scan_method | text | | |
scan_index_relid | bigint | | |
heap_tuples_total | bigint | | |
heap_tuples_scanned | bigint | | |
heap_tuples_vacuumed | bigint | | |
heap_tuples_recently_dead | bigint | | |

=== It will be changed on next patch ===

- Rename to pg_stat_progress_reolg from pg_stat_progress_cluster
- Remove heap_tuples_total column from the view
- Add a progress counter in the phase of "sorting tuples" (difficult?!)

=== My test case as a bonus ===

I share my test case of progress monitor.
If someone wants to watch the current progress monitor, you can use
this test case as a example.

[Terminal1]
Run this query on psql:

select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;

Thanks,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrien Nayrat 2017-09-12 13:27:43 Re: PG 10 release notes
Previous Message Peter Eisentraut 2017-09-12 12:53:44 Re: Create replication slot in pg_basebackup if requested and not yet present