Re: Parallel heap vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2025-04-28 17:37:13
Message-ID: CAD21AoCsyk61yqD0KMm05S2EbdEh1JL3Mcsx8NkWiiJM0_uBVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 6, 2025 at 10:27 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-san.
>
> I was revisiting this thread after a long time. I found most of my
> previous review comments from v11-0001 were not yet addressed. I can't
> tell if they are deliberately left out, or if they are accidentally
> overlooked. Please see the details below.

Sorry I missed to address or reply these comments.

>
> On Mon, Mar 10, 2025 at 3:05 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > ======
> > src/backend/access/table/tableamapi.c
> >
> > 2.
> > Assert(routine->relation_vacuum != NULL);
> > + Assert(routine->parallel_vacuum_compute_workers != NULL);
> > Assert(routine->scan_analyze_next_block != NULL);
> >
> > Is it better to keep these Asserts in the same order that the
> > TableAmRoutine fields are assigned (in heapam_handler.c)?
> >
>
> Not yet addressed?

Will fix.

>
> > ~~~
> >
> > 3.
> > + /*
> > + * Callbacks for parallel vacuum are also optional (except for
> > + * parallel_vacuum_compute_workers). But one callback implies presence of
> > + * the others.
> > + */
> > + Assert(((((routine->parallel_vacuum_estimate == NULL) ==
> > + (routine->parallel_vacuum_initialize == NULL)) ==
> > + (routine->parallel_vacuum_initialize_worker == NULL)) ==
> > + (routine->parallel_vacuum_collect_dead_items == NULL)));
> >
> > /also optional/optional/
> >
>
> Not yet addressed?

Will fix.

>
> > ======
> > src/include/access/heapam.h
> >
> > +extern int heap_parallel_vacuum_compute_workers(Relation rel, int
> > nworkers_requested);
> >
> > 4.
> > wrong tab/space after 'int'.
>
> Not yet addressed?

Will fix.

>
> >
> > ======
> > src/include/access/tableam.h
> >
> > 5.
> > + /*
> > + * Compute the number of parallel workers for parallel table vacuum. The
> > + * parallel degree for parallel vacuum is further limited by
> > + * max_parallel_maintenance_workers. The function must return 0 to disable
> > + * parallel table vacuum.
> > + *
> > + * 'nworkers_requested' is a >=0 number and the requested number of
> > + * workers. This comes from the PARALLEL option. 0 means to choose the
> > + * parallel degree based on the table AM specific factors such as table
> > + * size.
> > + */
> > + int (*parallel_vacuum_compute_workers) (Relation rel,
> > + int nworkers_requested);
> >
> > The comment here says "This comes from the PARALLEL option." and "0
> > means to choose the parallel degree...". But, the PG docs [1] says "To
> > disable this feature, one can use PARALLEL option and specify parallel
> > workers as zero.".
> >
> > These two descriptions "disable this feature" (PG docs) and letting
> > the system "choose the parallel degree" (code comment) don't sound the
> > same. Should this 0001 patch update the PG documentation for the
> > effect of setting PARALLEL value zero?
>
> Not yet addressed?

It often happens that we treat a value differently when the user
inputs the value and when we use it internally. I think the comment
should follow how to use VacuumParams.nworkers internally, which is
described in the comment as follow:

/*
* The number of parallel vacuum workers. 0 by default which means choose
* based on the number of indexes. -1 indicates parallel vacuum is
* disabled.
*/
int nworkers;

So it seems no problem to me.

>
> >
> > ~~~
> >
> > 6.
> > + /*
> > + * Initialize DSM space for parallel table vacuum.
> > + *
> > + * Not called if parallel table vacuum is disabled.
> > + *
> > + * Optional callback, but either all other parallel vacuum callbacks need
> > + * to exist, or neither.
> > + */
> >
> > "or neither"?
> >
> > Also, saying "all other" seems incorrect because
> > parallel_vacuum_compute_workers callback must exist event if
> > parallel_vacuum_initialize does not exist.
> >
> > IMO you meant to say "all optional", and "or none".
> >
> > SUGGESTION:
> > Optional callback. Either all optional parallel vacuum callbacks need
> > to exist, or none.
> >
> > (this same issue is repeated in multiple places).
>
> Not yet addressed?

Will fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-04-28 17:37:17 Re: Parallel heap vacuum
Previous Message Tom Lane 2025-04-28 17:20:34 Re: Accounting for metapages in genericcostestimate()