Re: CLUSTER command progress monitor

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER command progress monitor
Date: 2017-09-11 17:23:01
Message-ID: CAH2-WzkFr=buahK1LMriHti_RkA=DJnd6n1ACUQ5Z8zPM29bbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless? I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-09-11 17:32:37 Re: pgbench tap tests & minor fixes.
Previous Message Peter Geoghegan 2017-09-11 16:24:12 Re: The case for removing replacement selection sort