Re: Should vacuum process config file reload more often

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-03-24 00:27:17
Message-ID: CAAKRu_YraQ1mtqRJSqv+rvp0Z3VSWhy19gEkAoYqRkmOFRbYXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sun, Mar 19, 2023 at 7:47 AM Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> Do we need to calculate the number of workers running with
> nworkers_for_balance by iterating over the running worker list? I
> guess autovacuum workers can increment/decrement it at the beginning
> and end of vacuum.

I don't think we can do that because if a worker crashes, we have no way
of knowing if it had incremented or decremented the number, so we can't
adjust for it.

> > > > > > Also not sure how the patch interacts with failsafe autovac and parallel
> > > > > > vacuum.
> > > > >
> > > > > Good point.
> > > > >
> > > > > When entering the failsafe mode, we disable the vacuum delays (see
> > > > > lazy_check_wraparound_failsafe()). We need to keep disabling the
> > > > > vacuum delays even after reloading the config file. One idea is to
> > > > > have another global variable indicating we're in the failsafe mode.
> > > > > vacuum_delay_point() doesn't update VacuumCostActive if the flag is
> > > > > true.
> > > >
> > > > I think we might not need to do this. Other than in
> > > > lazy_check_wraparound_failsafe(), VacuumCostActive is only updated in
> > > > two places:
> > > >
> > > > 1) in vacuum() which autovacuum will call per table. And failsafe is
> > > > reset per table as well.
> > > >
> > > > 2) in vacuum_delay_point(), but, since VacuumCostActive will already be
> > > > false when we enter vacuum_delay_point() the next time after
> > > > lazy_check_wraparound_failsafe(), we won't set VacuumCostActive there.
> > >
> > > Indeed. But does it mean that there is no code path to turn
> > > vacuum-delay on, even when vacuum_cost_delay is updated from 0 to
> > > non-0?
> >
> > Ah yes! Good point. This is true.
> > I'm not sure how to cheaply allow for re-enabling delays after disabling
> > them in the middle of a table vacuum.
> >
> > I don't see a way around checking if we need to reload the config file
> > on every call to vacuum_delay_point() (currently, we are only doing this
> > when we have to wait anyway). It seems expensive to do this check every
> > time. If we do do this, we would update VacuumCostActive when updating
> > VacuumCostDelay, and we would need a global variable keeping the
> > failsafe status, as you mentioned.
> >
> > It could be okay to say that you can only disable cost-based delays in
> > the middle of vacuuming a table (i.e. you cannot enable them if they are
> > already disabled until you start vacuuming the next table). Though maybe
> > it is weird that you can increase the delay but not re-enable it...
>
> On Mon, Mar 20, 2023 at 1:48 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > So, I thought about it some more, and I think it is a bit odd that you
> > can increase the delay and limit but not re-enable them if they were
> > disabled. And, perhaps it would be okay to check ConfigReloadPending at
> > the top of vacuum_delay_point() instead of only after sleeping. It is
> > just one more branch. We can check if VacuumCostActive is false after
> > checking if we should reload and doing so if needed and return early.
> > I've implemented that in attached v6.
> >
> > I added in the global we discussed for VacuumFailsafeActive. If we keep
> > it, we can probably remove the one in LVRelState -- as it seems
> > redundant. Let me know what you think.
>
> I think the following change is related:
>
> - if (!VacuumCostActive || InterruptPending)
> + if (InterruptPending || VacuumFailsafeActive ||
> + (!VacuumCostActive && !ConfigReloadPending))
> return;
>
> + /*
> + * Reload the configuration file if requested. This allows changes to
> + * [autovacuum_]vacuum_cost_limit and [autovacuum_]vacuum_cost_delay to
> + * take effect while a table is being vacuumed or analyzed.
> + */
> + if (ConfigReloadPending && !analyze_in_outer_xact)
> + {
> + ConfigReloadPending = false;
> + ProcessConfigFile(PGC_SIGHUP);
> + AutoVacuumUpdateDelay();
> + AutoVacuumUpdateLimit();
> + }
>
> It makes sense to me that we need to reload the config file even when
> vacuum-delay is disabled. But I think it's not convenient for users
> that we don't reload the configuration file once the failsafe is
> triggered. I think users might want to change some GUCs such as
> log_autovacuum_min_duration.

Ah, okay. Attached v7 has this change (it reloads even if failsafe is
active).

> > And, I was wondering if it was worth trying to split up the part that
> > reloads the config file and all of the autovacuum stuff. The reloading
> > of the config file by itself won't actually result in autovacuum workers
> > having updated cost delays because of them overwriting it with
> > wi_cost_delay, but it will allow VACUUM to have those updated values.
>
> It makes sense to me to have changes for overhauling the rebalance
> mechanism in a separate patch.
>
> Looking back at the original concern you mentioned[1]:
>
> speed up long-running vacuum of a large table by
> decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the
> config file is only reloaded between tables (for autovacuum) or after
> the statement (for explicit vacuum).
>
> does it make sense to have autovac_balance_cost() update workers'
> wi_cost_delay too? Autovacuum launcher already reloads the config file
> and does the rebalance. So I thought autovac_balance_cost() can update
> the cost_delay as well, and this might be a minimal change to deal
> with your concern. This doesn't have the effect for manual VACUUM but
> since vacuum delay is disabled by default it won't be a big problem.
> As for manual VACUUMs, we would need to reload the config file in
> vacuum_delay_point() as the part of your patch does. Overhauling the
> rebalance mechanism would be another patch to improve it further.

So, we can't do this without acquiring an access shared lock on every
call to vacuum_delay_point() because cost delay is a double.

I will work on a patchset with separate commits for reloading the config
file, though (with autovac not benefitting in the first commit).

On Thu, Mar 23, 2023 at 12:24 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
> +bool VacuumFailsafeActive = false;
> This needs documentation, how it's used and how it relates to failsafe_active
> in LVRelState (which it might replace(?), but until then).

Thanks! I've removed LVRelState->failsafe_active.

I've also separated the VacuumFailsafeActive change into its own commit.
I will say that that commit message needs some work.

> + pg_atomic_uint32 nworkers_for_balance;
> This needs a short oneline documentation update to the struct comment.

Done. I also prefixed with av to match the other members. I am thinking
that this variable name could be better. I want to convey that it is the
number of workers sharing a cost limit, so I considered
av_limit_sharers or something like that. I am looking to convey that
it is the number of workers amongst whom we must split the cost limit.

>
> - double wi_cost_delay;
> - int wi_cost_limit;
> - int wi_cost_limit_base;
> This change makes the below comment in do_autovacuum in need of an update:
> /*
> * Remove my info from shared memory. We could, but intentionally.
> * don't, clear wi_cost_limit and friends --- this is on the
> * assumption that we probably have more to do with similar cost
> * settings, so we don't want to give up our share of I/O for a very
> * short interval and thereby thrash the global balance.
> */

Updated to mention wi_dobalance instead.
On the topic of wi_dobalance, should we bother making it an atomic flag
instead? We would avoid taking a lock a few times, though probably not
frequently enough to matter. I was wondering if making it atomically
accessible would be less confusing than acquiring a lock only to set
one member in do_autovacuum() (and otherwise it is only read). I think
if I had to make it an atomic flag, I would reverse the logic and make
it wi_skip_balance or something like that.

> + if (av_table_option_cost_delay >= 0)
> + VacuumCostDelay = av_table_option_cost_delay;
> + else
> + VacuumCostDelay = autovacuum_vac_cost_delay >= 0 ?
> + autovacuum_vac_cost_delay : VacuumCostDelay;
> While it's a matter of personal preference, I for one would like if we reduced
> the number of ternary operators in the vacuum code, especially those mixed into
> if statements. The vacuum code is full of this already though so this isn't
> less of an objection (as it follows style) than an observation.

I agree. This one was better served as an "else if" anyway -- updated!

>
> + * note: in cost_limit, zero also means use value from elsewhere, because
> + * zero is not a valid value.
> ...
> + int vac_cost_limit = autovacuum_vac_cost_limit > 0 ?
> + autovacuum_vac_cost_limit : VacuumCostLimit;
> Not mentioning the fact that a magic value in a GUC means it's using the value
> from another GUC (which is not great IMHO), it seems we are using zero as well
> as -1 as that magic value here? (not introduced in this patch.) The docs does
> AFAICT only specify -1 as that value though. Am I missing something or is the
> code and documentation slightly out of sync?

I copied that comment from elsewhere, but, yes it is a weird situation.
So, you can set autovacuum_vacuum_cost_limit to 0, -1 or a
positive number. You can only set vacuum_cost_limit to a positive value.
The documentation mentions that setting autovacuum_vacuum_cost_limit to
-1, the default, will have it use vacuum_cost_limit. However, it says
nothing about what setting it to 0 does. In the code, everywhere assumes
if autovacuum_vacuum_cost_limit is 0 OR -1, use vacuum_cost_limit.

This is in contrast to autovacuum_vacuum_cost_delay, for which 0 means
to disable it -- so setting autovacuum_vacuum_cost_delay to 0 will
specifically not fall back to vacuum_cost_limit.

I think the problem is that 0 is not a valid cost limit (i.e. it has no
meaning like infinity/no limit), so we basically don't want to allow the
cost limit to be set to 0, but GUC values have to be a range with a max
and a min, so we can't just exclude 0 if we want to allow -1 (as far as
I know). I think it would be nice to be able to specify multiple valid
ranges for GUCs to the GUC machinery.

So, to answer your question, yes, the code and docs are a bit
out-of-sync.

> I need another few readthroughs to figure out of VacuumFailsafeActive does what
> I think it does, and should be doing, but in general I think this is a good
> idea and a patch in good condition close to being committable.

I will take a pass at splitting up the main commit into two. However, I
have attached a new version with the other specific updates discussed in
this thread. Feel free to provide review on this version in the meantime.

- Melanie

Attachment Content-Type Size
v7-0001-Make-failsafe_active-global.patch text/x-patch 5.3 KB
v7-0002-auto-vacuum-reloads-config-file-more-often.patch text/x-patch 19.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-24 00:39:12 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message Michael Paquier 2023-03-24 00:04:31 Re: Generate pg_stat_get_xact*() functions with Macros