Re: Add index scan progress to pg_stat_progress_vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-11-09 03:00:30
Message-ID: 20221109030030.5wmvr76xj256cccw@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-04 13:27:34 +0000, Imseih (AWS), Sami wrote:
> diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
> index b4fa5f6bf8..3d5e4600dc 100644
> --- a/src/backend/access/gin/ginvacuum.c
> +++ b/src/backend/access/gin/ginvacuum.c
> @@ -633,6 +633,9 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
> UnlockReleaseBuffer(buffer);
> buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
> RBM_NORMAL, info->strategy);
> +
> + if (info->report_parallel_progress)
> + info->parallel_progress_callback(info->parallel_progress_arg);
> }
>
> /* right now we found leftmost page in entry's BTree */

I don't think any of these progress callbacks should be done while pinning a
buffer and ...

> @@ -677,6 +680,9 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
> buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
> RBM_NORMAL, info->strategy);
> LockBuffer(buffer, GIN_EXCLUSIVE);
> +
> + if (info->report_parallel_progress)
> + info->parallel_progress_callback(info->parallel_progress_arg);
> }
>
> MemoryContextDelete(gvs.tmpCxt);

... definitely not while holding a buffer lock.

I also don't understand why info->parallel_progress_callback exists? It's only
set to parallel_vacuum_progress_report(). Why make this stuff more expensive
than it has to already be?

> +parallel_vacuum_progress_report(void *arg)
> +{
> + ParallelVacuumState *pvs = (ParallelVacuumState *) arg;
> +
> + if (IsParallelWorker())
> + return;
> +
> + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
> + pg_atomic_read_u32(&(pvs->shared->idx_completed_progress)));
> +}

So each of the places that call this need to make an additional external
function call for each page, just to find that there's nothing to do or to
make yet another indirect function call. This should probably a static inline
function.

This is called, for every single page, just to read the number of indexes
completed by workers? A number that barely ever changes?

This seems all wrong.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-09 03:02:48 Re: Asynchronous and "direct" IO support for PostgreSQL.
Previous Message Hayato Kuroda (Fujitsu) 2022-11-09 02:54:19 RE: Perform streaming logical transactions by background workers and parallel apply