Re: parallel vacuum comments

From: Andres Freund <andres(at)anarazel(dot)de>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: parallel vacuum comments
Date: 2021-11-04 19:00:03
Message-ID: 20211104190003.l5547auul46k7d5v@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
> On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > But even though we have this space optimized bitmap thing, we actually need
> > more memory allocated for each index, making this whole thing pointless.
>
> Right. But is better to change to use booleans?

It seems very clearly better to me. We shouldn't use complicated stuff like

#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))

when there's reason / benefit.

> > - Imo it's pretty confusing to have functions like
> > lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> > vacuum or index cleanup with parallel workers.", based on
> > lps->lvshared->for_cleanup.
>
> Okay. We need to set lps->lvshared->for_cleanup to tell worker do
> either index vacuum or index cleanup. So it might be better to pass
> for_cleanup flag down to the functions in addition to setting
> lps->lvshared->for_cleanup.

Yup.

> > - I don't like some of the new names introduced in 14. E.g.
> > "do_parallel_processing" is way too generic.
>
> I listed the function names that probably needs to be renamed from
> that perspecti:
>
> * do_parallel_processing
> * do_serial_processing_for_unsafe_indexes
> * parallel_process_one_index
>
> Is there any other function that should be renamed?

parallel_processing_is_safe().

I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.

> > - On a higher level, a lot of this actually doesn't seem to belong into
> > vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
> > code is heap specific. And vacuumlazy.c is large enough without the parallel
> > code.
>
> I don't come with an idea to make them more generic. Could you
> elaborate on that?

To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.

Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),

don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-04 19:42:37 Re: parallel vacuum comments
Previous Message Peter Eisentraut 2021-11-04 18:58:54 Re: Time to drop plpython2?