Re: CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER command progress monitor
Date: 2017-09-11 14:38:44
Message-ID: CA+Tgmoa2Huap7teh5RPin+36++GbBKukoQgha5KDhJTJqKeF1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thanks for the comment.
>
> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
> heap and write new heap" when INDEX SCAN was selected.
>
> I agree that progress reporting for sort is difficult. So it only reports
> the phase ("sorting tuples") in the current design of progress monitor of
> cluster.
> It doesn't report counter of sort.

Doesn't that make it almost useless? I would guess that scanning the
heap and writing the new heap would ordinarily account for most of the
runtime, or at least enough that you're going to want something more
than just knowing that's the phase you're in.

>> The patch is getting the value reported as heap_tuples_total from
>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can
>> see that value anyway if they wish. The point of the progress
>> counters is to expose things the user couldn't otherwise see. It's
>> also not necessarily accurate: it's only an estimate in the best case,
>> and may be way off if the relation has recently be extended by a large
>> amount. I think it's pretty important that we try hard to only report
>> values that are known to be accurate, because users hate (and mock)
>> inaccurate progress reports.
>
> Do you mean to use the number of rows by using below calculation instead
> OldHeap->rd_rel->reltuples?
>
> estimate rows = physical table size / average row length

No, I mean don't report it at all. The caller can do that calculation
if they wish, without any help from the progress reporting machinery.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-11 14:43:55 Re: Patch: Add --no-comments to skip COMMENTs with pg_dump
Previous Message Robert Haas 2017-09-11 14:31:01 Re: Partition-wise join for join between (declaratively) partitioned tables