Re: parallel vacuum comments

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: parallel vacuum comments
Date: 2021-11-09 12:53:40
Message-ID: CAD21AoBbAkaZKkNaamEFLfxNZVRX8=O+PLvYWMYFittzabbe7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
> > On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > But even though we have this space optimized bitmap thing, we actually need
> > > more memory allocated for each index, making this whole thing pointless.
> >
> > Right. But is better to change to use booleans?
>
> It seems very clearly better to me. We shouldn't use complicated stuff like
>
> #define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
> #define GetSharedIndStats(s) \
> ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
> #define IndStatsIsNull(s, i) \
> (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
>
> when there's reason / benefit.
>
>
> > > - Imo it's pretty confusing to have functions like
> > > lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> > > vacuum or index cleanup with parallel workers.", based on
> > > lps->lvshared->for_cleanup.
> >
> > Okay. We need to set lps->lvshared->for_cleanup to tell worker do
> > either index vacuum or index cleanup. So it might be better to pass
> > for_cleanup flag down to the functions in addition to setting
> > lps->lvshared->for_cleanup.
>
> Yup.
>
>
> > > - I don't like some of the new names introduced in 14. E.g.
> > > "do_parallel_processing" is way too generic.
> >
> > I listed the function names that probably needs to be renamed from
> > that perspecti:
> >
> > * do_parallel_processing
> > * do_serial_processing_for_unsafe_indexes
> > * parallel_process_one_index
> >
> > Is there any other function that should be renamed?
>
> parallel_processing_is_safe().
>
> I don't like that there's three different naming patterns for parallel
> things. There's do_parallel_*, there's parallel_, and there's
> (begin|end|compute)_parallel_*.
>
>
> > > - On a higher level, a lot of this actually doesn't seem to belong into
> > > vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
> > > code is heap specific. And vacuumlazy.c is large enough without the parallel
> > > code.
> >
> > I don't come with an idea to make them more generic. Could you
> > elaborate on that?
>
> To me the the job that the parallel vacuum stuff does isn't really specific to
> heap. Any table AM supporting indexes is going to need to do something pretty
> much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
> is very heap specific - you're not going to be able to reuse lazy_scan_heap()
> or such. Before the parallel vacuum stuff, the index specific code in
> vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.
>
> Based on a quick look
> parallel_vacuum_main(), parallel_processing_is_safe(),
> parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
> compute_parallel_vacuum_workers(), parallel_process_one_index(),
> do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
> do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
> do_parallel_lazy_vacuum_all_indexes(),
>
> don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
> file. Of course that requires a bit of work, because of the heavy reliance on
> LVRelState, but afaict there's not really an intrinsic need to use that.

Thanks for your explanation. Understood.

I'll update the patch accordingly.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-11-09 12:55:34 Re: Skipping logical replication transactions on subscriber side
Previous Message osumi.takamichi@fujitsu.com 2021-11-09 11:55:33 RE: Failed transaction statistics to measure the logical replication progress