Re: pg_stat_progress_create_index vs. parallel index builds

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Date: 2021-06-11 21:24:47
Message-ID: 202106112124.3hxm2o5o2es5@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Jun-04, Greg Nancarrow wrote:

> On Thu, Jun 3, 2021 at 1:49 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> > On Wed, 2 Jun 2021 at 17:42, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > >
> > > Nice. I gave it a try on the database I'm experimenting with, and it
> > > seems to be working fine. Please add it to the next CF.
> >
> > Thanks, cf available here: https://commitfest.postgresql.org/33/3149/
>
> The patch looks OK to me. It seems apparent that the lines added by
> the patch are missing from the current source in the parallel case.
>
> I tested with and without the patch, using the latest PG14 source as
> of today, and can confirm that without the patch applied, the "sorting
> live tuples" phase is not reported in the parallel-case, but with the
> patch applied it then does get reported in that case. I also confirmed
> that, as you said, the patch only addresses the usual case where the
> parallel leader participates in the parallel operation.

So, with Matthias' patch applied and some instrumentation to log (some)
parameter updates, this is what I get on building an index in parallel.
The "subphase" is parameter 10:

2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 0 to 1
2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 6 to 0
2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 8 to 403
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 9 to 2
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 10 to 1
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 11 to 0
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 15 to 0
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 10 to 2
2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 15 to 486726
2021-06-09 17:04:37.418 -04 19194 WARNING: updating param 10 to 3 <-- this one is new
2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 11 to 110000000
2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 15 to 0
2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 10 to 3
2021-06-09 17:04:42.237 -04 19194 WARNING: updating param 10 to 5

The thing to note is that we set subphase to 3 twice. The first of
those is added by the patch to _bt_parallel_scan_and_sort. The second
is in _bt_leafbuild, just before setting the subphase to LEAF_LOAD. So
the change is that we set the subphase to "sorting live tuples" five
seconds ahead of what we were doing previously. Seems ok. (We could
alternatively skip the progress update call in _bt_leafbuild; but those
calls are so cheap that adding a conditional jump is almost as
expensive.)

(The other potential problem might be to pointlessly invoke the progress
update calls when in a worker. But that's already covered because only
the leader passes progress=true to _bt_parallel_scan_and_sort.)

I'll push now.

--
Álvaro Herrera Valdivia, Chile

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-06-11 21:46:20 Re: Teaching users how they can get the most out of HOT in Postgres 14
Previous Message Tomas Vondra 2021-06-11 21:01:56 Re: postgres_fdw batching vs. (re)creating the tuple slots