Re: Add index scan progress to pg_stat_progress_vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-03-22 03:19:27
Message-ID: CAD21AoAwsaohDMissr_fR_YHp3fzq+eJ4n2=G2Bxji8aMqPAxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the late reply.

On Tue, Mar 15, 2022 at 1:20 AM Imseih (AWS), Sami <simseih(at)amazon(dot)com> wrote:
>
> > I'm still unsure the current design of 0001 patch is better than other
> > approaches we’ve discussed. Even users who don't use parallel vacuum
> > are forced to allocate shared memory for index vacuum progress, with
> > GetMaxBackends() entries from the beginning. Also, it’s likely to
> > extend the progress tracking feature for other parallel operations in
> > the future but I think the current design is not extensible. If we
> > want to do that, we will end up creating similar things for each of
> > them or re-creating index vacuum progress tracking feature while
> > creating a common infra. It might not be a problem as of now but I'm
> > concerned that introducing a feature that is not extensible and forces
> > users to allocate additional shmem might be a blocker in the future.
> > Looking at the precedent example, When we introduce the progress
> > tracking feature, we implemented it in an extensible way. On the other
> > hand, others in this thread seem to agree with this approach, so I'd
> > like to leave it to committers.
>
> Thanks for the review!
>
> I think you make strong arguments as to why we need to take a different approach now than later.
>
> Flaws with current patch set:
>
> 1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory serving a very specific purpose.
> 2. Going with the approach of a vacuum specific hash breaks the design of progress which is meant to be extensible.
> 3. Even if we go with this current approach as an interim solution, it will be a real pain in the future.
>
> With that said, v7 introduces the new infrastructure. 0001 includes the new infrastructure and 0002 takes advantage of this.
>
> This approach is the following:
>
> 1. Introduces a new API called pgstat_progress_update_param_parallel along with some others support functions. This new infrastructure is in backend_progress.c
>
> 2. There is still a shared memory involved, but the size is capped to " max_worker_processes" which is the max to how many parallel workers can be doing work at any given time. The shared memory hash includes a st_progress_param array just like the Backend Status array.

I think that there is a corner case where a parallel operation could
not perform due to the lack of a free shared hash entry, because there
is a window between a parallel worker exiting and the leader
deallocating the hash table entry.

BTW have we discussed another idea I mentioned before that we have the
leader process periodically check the number of completed indexes and
advertise it in its progress information? I'm not sure which one is
better but this idea would require only changes of vacuum code and
probably simpler than the current idea.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-03-22 03:24:23 Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Previous Message Andres Freund 2022-03-22 03:14:21 Re: Broken make dependency somewhere near win32ver.o?