From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-04-28 17:51:42 |
Message-ID: | CAD21AoDSU88Z1dHUOgLSHks=F9EPaH41PkWpzM-cTz3Xnxpyig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Apr 18, 2025 at 2:49 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> Thanks for updating the patch. I have been reviewing and below are comments for now.
Thank you for reviewing the patch!
>
> 01.
> Sorry if I forgot something, but is there a reason that
> parallel_vacuum_compute_workers() is mandatory? My fresh eye felt that the function
> can check and regards zero if the API is NULL.
I thought that making this function mandatory would be a good way for
table AM authors to indicate their intentions in terms of parallel
vacuum. On the other hand, I see your point; it would not bother table
AMs that are not going to support parallel vacuum. I'd like to hear
further opinions on this.
>
> 02. table_parallel_vacuum_estimate
> We can assert that state is not NULL.
I think the state can be NULL depending on the caller. It should not
be mandatory.
>
> 03. table_parallel_vacuum_initialize
>
> Same as 02.. Also, I think we should ensure here that table_parallel_vacuum_estimate()
> has already been called before, and current assertion might not enough because
> others can set random value here. Other functions like table_rescan_tidrange()
> checks the internal flag of the data structure, which is good for me. How do you
> feel to check pcxt->estimator has non-zero value?
>
> 04. table_parallel_vacuum_initialize_worker
> Same as 02. Also, can we esure that table_parallel_vacuum_initialize() has been
> done?
>
> 05. table_parallel_vacuum_collect_dead_items
> Same as 02. Also, can we esure that table_parallel_vacuum_initialize_worker()
> has been done?
IIUC table_parallel_vacuum_initialize() is called only by
vacuumparallel.c. which is not controllable outside of it. How can
others set a random value to nworkers?
>
> 06. table_parallel_vacuum_initialize_worker
> Comments atop function needs to be updated.
Did you refer to the following comment?
/*
* Initialize AM-specific vacuum state for worker processes.
*/
static inline void
table_parallel_vacuum_initialize_worker(Relation rel, struct
ParallelVacuumState *pvs,
struct ParallelWorkerContext *pwcxt,
void **state_out)
Which part of the comment do you think we need to update?
>
> 07.
> While playing, I found that the parallel vacuum worker can be launched more than
> pages:
>
> ```
> postgres=# CREATE TABLE var (id int);
> CREATE TABLE
> postgres=# INSERT INTO var VALUES (generate_series(1, 10000));
> INSERT 0 10000
> postgres=# VACUUM (parallel 80, verbose) var ;
> INFO: vacuuming "postgres.public.var"
> INFO: launched 80 parallel vacuum workers for collecting dead tuples (planned: 80)
> INFO: finished vacuuming "postgres.public.var": index scans: 0
> pages: 0 removed, 45 remain, 45 scanned (100.00% of total), 0 eagerly scanned
> ...
> ```
>
> I hope the minimum chunk size is a page so that this situation can reduce the performance.
> How do you feel to cap the value with rel::rd_rel::relpages in heap_parallel_vacuum_compute_workers()?
> This value is not always up-to-date but seems good candidate.
Thank you for testing the patch. Yes, we should avoid that. I think it
would be better to cap the number of workers based on the number of
chunks of blocks. I'm going to propose a new parallel scan method for
parallel lazy scan so it would make sense to choose a saner value
based on it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Corey Huinker | 2025-04-28 17:51:52 | Re: Disallow redundant indexes |
Previous Message | Masahiko Sawada | 2025-04-28 17:37:17 | Re: Parallel heap vacuum |