Re: CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER command progress monitor
Date: 2017-10-02 13:04:30
Message-ID: CA+TgmobNjKsg7t3PwDBjquYyrhBX1BeTsNEmAjfMKwEDz2ib3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 12, 2017 at 8:20 AM, Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless? I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort
> processing.
> Is this the right way?

Progress reporting on sorts seems like a tricky problem to me, as I
said before. In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result. There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples. On the other
hand, if work_mem is very large, the time taken for sorting each run
might itself be significant that we'd like to have insight into
progress. If we ignore those complexities, though, a reasonable way
of reporting progress might be to report the following:

1. blocks read from the relation
2. # of tuples we've put into the tuplesort
3. # of tuples we've extracted from the tuplesort

During the first part of the sort, (1) and (2) will be growing, and
the user can measure progress by comparing (1) to the total size of
the relation. During the final merge, (3) will be growing, eventually
becoming equal to (2), so the user can measure progress my comparing
(2) with (3).

This approach only works for a seqscan-and-sort, though. I'm not sure
what to do about the index scan case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Barai 2017-10-02 13:12:30 Re: GSoC 2017 : Patch for predicate locking in Gist index
Previous Message Robert Haas 2017-10-02 12:51:35 Re: Parallel Append implementation