Re: CLUSTER command progress monitor

From: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER command progress monitor
Date: 2017-09-12 12:20:41
Message-ID: 59B7D119.2000101@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/11 23:38, Robert Haas wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless? I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
I know that external merge sort takes a time than quick sort.
I'll try investigating how to get a counter from external merge sort processing.
Is this the right way?

>>> The patch is getting the value reported as heap_tuples_total from
>>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can
>>> see that value anyway if they wish. The point of the progress
>>> counters is to expose things the user couldn't otherwise see. It's
>>> also not necessarily accurate: it's only an estimate in the best case,
>>> and may be way off if the relation has recently be extended by a large
>>> amount. I think it's pretty important that we try hard to only report
>>> values that are known to be accurate, because users hate (and mock)
>>> inaccurate progress reports.
>>
>> Do you mean to use the number of rows by using below calculation instead
>> OldHeap->rd_rel->reltuples?
>>
>> estimate rows = physical table size / average row length
>
> No, I mean don't report it at all. The caller can do that calculation
> if they wish, without any help from the progress reporting machinery.

I see. I'll remove that column on next patch.

Regards,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-12 12:34:43 Re: Coverage improvements of src/bin/pg_basebackup and pg_receivewal --endpos
Previous Message Peter Eisentraut 2017-09-12 12:19:07 Re: Coverage improvements of src/bin/pg_basebackup and pg_receivewal --endpos