Re: Should vacuum process config file reload more often

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-03-24 17:27:45
Message-ID: CAAKRu_a39kpGO_SxWEGAHMEHFAOACk3wwQJhRBaN+7Ke9ua_qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 24, 2023 at 1:21 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Mar 24, 2023 at 9:27 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > On Sun, Mar 19, 2023 at 7:47 AM Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > > Do we need to calculate the number of workers running with
> > > nworkers_for_balance by iterating over the running worker list? I
> > > guess autovacuum workers can increment/decrement it at the beginning
> > > and end of vacuum.
> >
> > I don't think we can do that because if a worker crashes, we have no way
> > of knowing if it had incremented or decremented the number, so we can't
> > adjust for it.
>
> What kind of crash are you concerned about? If a worker raises an
> ERROR, we can catch it in PG_CATCH() block. If it's a FATAL, we can do
> that in FreeWorkerInfo(). A PANIC error ends up crashing the entire
> server.

Yes, but what about a worker that segfaults? Since table AMs can define
relation_vacuum(), this seems like a real possibility.

I'll address your other code feedback in the next version.

I realized nworkers_for_balance should be initialized to 0 and not 1 --
1 is misleading since there are often 0 autovac workers. We just never
want to use nworkers_for_balance when it is 0. But, workers put a floor
of 1 on the number when they divide limit/nworkers_for_balance (since
they know there must be at least one worker right now since they are a
worker). I thought about whether or not they should call
autovac_balance_cost() if they find that nworkers_for_balance is 0 when
updating their own limit, but I'm not sure.

> > > I need another few readthroughs to figure out of VacuumFailsafeActive does what
> > > I think it does, and should be doing, but in general I think this is a good
> > > idea and a patch in good condition close to being committable.
>
> Another approach would be to make VacuumCostActive a ternary value:
> on, off, and never. When we trigger the failsafe mode we switch it to
> never, meaning that it never becomes active even after reloading the
> config file. A good point is that we don't need to add a new global
> variable, but I'm not sure it's better than the current approach.

Hmm, this is interesting. I don't love the word "never" since it kind of
implies a duration longer than the current table being vacuumed. But we
could find a different word or just document it well. For clarity, we
might want to call it failsafe_mode or something.

I wonder if the primary drawback to converting
LVRelState->failsafe_active to a global VacuumFailsafeActive is just the
general rule of limiting scope to the minimum needed.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-24 18:12:01 Re: Transparent column encryption
Previous Message Tom Lane 2023-03-24 17:20:06 Re: Make EXPLAIN generate a generic plan for a parameterized query