Re: pg_stat_progress_create_index vs. parallel index builds

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Date: 2021-06-09 20:53:09
Message-ID: 202106092053.nvvkonovsy7u@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Jun-04, Greg Nancarrow wrote:

> I tested with and without the patch, using the latest PG14 source as
> of today, and can confirm that without the patch applied, the "sorting
> live tuples" phase is not reported in the parallel-case, but with the
> patch applied it then does get reported in that case. I also confirmed
> that, as you said, the patch only addresses the usual case where the
> parallel leader participates in the parallel operation.
> What is slightly puzzling to me (and perhaps digging deeper will
> reveal it) is why this "sorting live tuples" phase seems so short in
> the serial case compared to the parallel case?
> For example, in my test I created an index on a column of a table
> having 10 million records, and it took about 40 seconds, during which
> the "sorting live tuples" phase seemed to take about 8 seconds. Yet
> for the serial case, index creation took about 75 seconds, during
> which the "sorting live tuples" phase seemed to take about 1 second.

I think the reason is that scanning the table is not just scanning the
table -- it is also feeding tuples to tuplesort, which internally is
already sorting them as it receives them. So by the time you're done
scanning the relation, some (large) fraction of the sorting work is
already done, which is why the "sorting" phase is so short.

Tracing sort is not easy. we discussed this earlier; see
https://postgr.es/m/20181218210159.xtkltzm7flrwsm55@alvherre.pgsql
for example.

--
Álvaro Herrera Valdivia, Chile
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-06-09 20:56:17 Re: unnesting multirange data types
Previous Message Bossart, Nathan 2021-06-09 20:52:47 Re: Estimating HugePages Requirements?