Re: parallel vacuum comments

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: parallel vacuum comments
Date: 2021-12-06 04:47:06
Message-ID: CAA4eK1JnU0Fv+fgBCFuWzfmToN_oHV65w5wDg5h-ZZyxQfXnCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've attached updated patches.
> > >
> >
> > I have a few comments on v4-0001.
>
> Thank you for the comments!
>
> > 1.
> > In parallel_vacuum_process_all_indexes(), we can combine the two
> > checks for vacuum/cleanup at the beginning of the function
>
> Agreed.
>
> > and I think
> > it is better to keep the variable name as bulkdel or cleanup instead
> > of vacuum as that is more specific and clear.
>
> I was thinking to use the terms "bulkdel" and "cleanup" instead of
> "vacuum" and "cleanup" for the same reason. That way, probably we can
> use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
> calling to ambulkdelete) and index cleanup (calling to
> amvacuumcleanup), respectively, and use "vacuum" when processing an
> index, i.g., doing either bulk-delete or cleanup, instead of using
> just "processing" . But we already use “vacuum” and “cleanup” in many
> places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
> to use “bulkdel” instead of “vacuum”, I think it would be better to
> change the terminology in vacuumlazy.c thoroughly, probably in a
> separate patch.
>

Okay.

> > 2. The patch seems to be calling parallel_vacuum_should_skip_index
> > thrice even before starting parallel vacuum. It has a call to find the
> > number of blocks which has to be performed for each index. I
> > understand it might not be too costly to call this but it seems better
> > to remember this info like we are doing in the current code.
>
> Yes, we can bring will_vacuum_parallel array back to the code. That
> way, we can remove the call to parallel_vacuum_should_skip_index() in
> parallel_vacuum_begin().
>
> > We can
> > probably set parallel_workers_can_process in parallel_vacuum_begin and
> > then again update in parallel_vacuum_process_all_indexes. Won't doing
> > something like that be better?
>
> parallel_workers_can_process can vary depending on bulk-deletion, the
> first time cleanup, or the second time (or more) cleanup. What can we
> set parallel_workers_can_process based on in parallel_vacuum_begin()?
>

I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().

> >
> > 3. /*
> > * Copy the index bulk-deletion result returned from ambulkdelete and
> > @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
> > * Since all vacuum workers write the bulk-deletion result at different
> > * slots we can write them without locking.
> > */
> > - if (shared_istat && !shared_istat->updated && istat_res != NULL)
> > + if (!pindstats->istat_updated && istat_res != NULL)
> > {
> > - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
> > - shared_istat->updated = true;
> > + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
> > + pindstats->istat_updated = true;
> >
> > /* Free the locally-allocated bulk-deletion result */
> > pfree(istat_res);
> > -
> > - /* return the pointer to the result from shared memory */
> > - return &shared_istat->istat;
> > }
> >
> > I think here now we copy the results both for local and parallel
> > cleanup. Isn't it better to write something about that in comments as
> > it is not clear from current comments?
>
> What do you mean by "local cleanup"?
>

Clean-up performed by leader backend.

> >
> > 4.
> > + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
> > + est_shared_len = MAXALIGN(sizeof(LVShared));
> > shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
> >
> > Do we need MAXALIGN here? I think previously we required it here
> > because immediately after this we were writing index stats but now
> > those are allocated separately. Normally, shm_toc_estimate_chunk()
> > aligns the size but sometimes we need to do it so as to avoid
> > unaligned accessing of shared mem. I am really not sure whether we
> > require it for dead_items, do you remember why it is there in the
> > first place.
>
> Indeed. I don't remember that. Probably it's an oversight.
>

Yeah, I also think so.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-12-06 04:59:20 Re: pg_get_publication_tables() output duplicate relid
Previous Message Sadhuprasad Patro 2021-12-06 04:44:43 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)