Re: pg_stat_progress_create_index vs. parallel index builds

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Date: 2021-06-02 13:23:23
Message-ID: 8b7088ab-f7c0-bb28-842e-0c0687daaafb@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/2/21 3:03 PM, Matthias van de Meent wrote:
> On Wed, 2 Jun 2021 at 13:57, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> Hi,
>>
>> While experimenting with parallel index builds, I've noticed a somewhat
>> strange behavior of pg_stat_progress_create_index when a btree index is
>> built with parallel workers - some of the phases seem to be missing.
>>
>> In serial (no parallelism) mode, the progress is roughly this (it's
>> always the first/last timestamp of each phase):
>>
>> | command | phase
>> -------------+--------------+----------------------------------------
>> 12:56:01 AM | CREATE INDEX | building index: scanning table
>> ...
>> 01:06:22 AM | CREATE INDEX | building index: scanning table
>> 01:06:23 AM | CREATE INDEX | building index: sorting live tuples
>> ...
>> 01:13:10 AM | CREATE INDEX | building index: sorting live tuples
>> 01:13:11 AM | CREATE INDEX | building index: loading tuples in tree
>> ...
>> 01:24:02 AM | CREATE INDEX | building index: loading tuples in tree
>>
>> So it goes through three phases:
>>
>> 1) scanning tuples
>> 2) sorting live tuples
>> 3) loading tuples in tree
>>
>> But with parallel build index build, it changes to:
>>
>> | command | phase
>> -------------+--------------+----------------------------------------
>> 11:40:48 AM | CREATE INDEX | building index: scanning table
>> ...
>> 11:47:24 AM | CREATE INDEX | building index: scanning table (scan
>> complete)
>> 11:56:22 AM | CREATE INDEX | building index: scanning table
>> 11:56:23 AM | CREATE INDEX | building index: loading tuples in tree
>> ...
>> 12:05:33 PM | CREATE INDEX | building index: loading tuples in tree
>>
>> That is, the "sorting live tuples" phase disappeared, and instead it
>> seems to be counted in the "scanning table" one, as if there was an
>> update of the phase missing.
>
>> I've only tried this on master, but I assume it behaves like this in the
>> older releases too. I wonder if this is intentional - it sure is a bit
>> misleading.
>
> This was a suprise to me as well. According to documentation in
> sortsupport.h (line 125-129) the parallel workers produce pre-sorted
> segments during the scanning phase, which are subsequently merged by
> the leader. This might mean that the 'sorting' phase is already
> finished during the 'scanning' phase by waiting for the parallel
> workers; I haven't looked further if this is the case and whether it
> could be changed to also produce the sorting metrics, but seeing as it
> is part of the parallel workers API of tuplesort, I think fixing it in
> current releases is going to be difficult.
>

Maybe. Perhaps it's more complicated to decide when to switch between
phases with parallel workers. Still, the table scan is done after ~8
minutes (based on blocks_total vs. blocks_done), yet we keep that phase
for another ~9 minutes. It seems this is where the workers do the sort,
so "sorting live tuples" seems like a more natural phase for this.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Mlodgenski 2021-06-02 13:38:39 Support for CREATE MODULE?
Previous Message Matthias van de Meent 2021-06-02 13:03:41 Re: pg_stat_progress_create_index vs. parallel index builds