Re: Add index scan progress to pg_stat_progress_vacuum

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2023-04-06 03:28:04
Message-ID: ZC48RPeRz2DcNqqA@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 05, 2023 at 02:31:54PM +0000, Imseih (AWS), Sami wrote:
>> That seems a valid argument. I was thinking that such an asynchronous
>> state update mechanism would be a good infrastructure for progress
>> reporting of parallel operations. It might be worth considering to use
>> it in more general usage but since the current implementation is
>> minimal can we extend it in the future when we need it for other use
>> cases?
>
> I don't think we should delay this patch to design a more general
> infrastructure. I agree this can be handled by a future requirement.

Not so sure to agree on that. As the patch stands, we have a rather
generally-purposed new message type and facility (callback trigger
poke from workers to their leader) used for a not-so-general purpose,
while being documented under this not-so-general purpose, which is
progress reporting. I agree that relying on pqmq.c to force the
leader to be more sensible to refreshes is sensible, because it is
cheap, but the interface seems kind of misplaced to me. As one thing,
for example, it introduces a dependency to parallel.h to do progress
reporting without touching at backend_progress.h. Is a callback
approach combined with a counter in shared memory the best thing there
could be? Could it be worth thinking about a different design where
the value incremented and the parameters of
pgstat_progress_update_param() are passed through the 'P' message
instead? It strikes me that gathering data in the leader from a poke
of the workers is something that could be useful in so much more areas
than just the parallel index operations done in a vacuum because we do
more and more things in parallel these days, so the API interface
ought to have some attention.

As some say, the introduction of a new message type in pqmq.c would be
basically a one-way door, because we'd have to maintain it in a stable
branch. I would not take that lightly. One idea of interface that
could be used is an extra set of APIs for workers to do progress
reporting, part of backend_progress.h, where we use a pqmq message
type in a centralized location, say something like a
pgstat_progress_parallel_incr_param().

About the callback interface, we may also want to be more careful
about more things, like the handling of callback chains, or even
unregistrations of callbacks? There could be much more uses to that
than just progress reporting, though this comes to a balance of what
the leader needs to do before the workers are able to poke at it on a
periodic basis to make the refresh of the aggregated progress
reporting data more verbose. There is also an argument where we could
have each worker report their progress independently of the leader?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-04-06 03:55:56 Re: Fix a comment in basic_archive about NO_INSTALLCHECK
Previous Message Melanie Plageman 2023-04-06 03:10:00 Re: Should vacuum process config file reload more often